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CHAPTER 1 



Perspective 



Advances in integrated circuit technology have had a revolutionary impact on computer 
system design. A chip today integrates far greater sophistication and computing power than 
ever before. Fabrication processes have progressed rapidly so that chips with one million com- 
ponents are a reality, and enthusiasts predict chips with upto one hundred million components 
within a decade. Tndeed, it is oxnoctprl t.hnt if inn heam etching fpchninnpq hr-fnmo yi'ih' 
for "printing" chips directly, then minimum feature sizes would drop by a factor of ten, thus 
allowing a hundred-fold increase in the number of components on a chip. 

More significantly, the new technology encourages custom design of special purpose in- 
tegrated systems for solving very large scale sophisticated problems. No longer is it necessary 
to use a single conventional architecture for solving diverse problems. Instead, the computa- 
tional structure of a problem may be mapped directly into hardware. This has shifted the 
emphasis from searching for algorithms, necessarily convoluted to suit a given architecture, to 
efficient hardware design suited to individual problems. 

While this emphasis on greater design flexibility has opened up new directions in comput- 
ing, a number of difficult problems must be addressed before the emerging technologies can be 
effectively exploited. Probably the most significant development in easing the awesome task 
of designing and implementing large systems has been the standardization of design rules and 
the widespread use of standard building blocks. The design methodology expounded by Mead 
and Conway [55], and the development of building blocks such as gate-arrays, PLA's, and 
ROM's haw helped shift the emphasis in circuit design from the exclusive domain of electronics 
to a higher, more functional level, where aspects of circuit layout may be treated in purely 
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geometrical terms. 

This thesis examines various aspects of the circuit layout problem. We address questions 
such as: why is circuit layout difficult, what properties of a circuit critically determine the 
quality of its layout, and what kinds of heuristics can help solve layout problems efficiently? 
These questions are motivated by the need for general techniques for laying out very large 
circuits. Such basic issues must be addressed before building any automatic or computer-aided 
design and layout system. 

Although the circuit layout problem is not new, progress has been painfully slow. The 
proliferation of diverse technologies and concerns has only exacerbated the layout problem. 
On the one hand we desire to minimize layout area, signal delays, and power dissipation, while 
on the other hand we need to increase reliability by increased redundancy. In addition we 
require that custom circuits be assembled using standard configurable or restructurable chips 
as building blocks. It is not at all clear whether these different requirements are compatible or 
necessarily contradictory. 

Part I presents a general theory for VLSI graph layout. Not only does the theory identify 
structural properties of circuits that critically determine the quality of layouts, but also provides 
techniques for solving various layout problems. Perhaps the most significant result that emerges 
is a general framework for solving diverse problems in a simple and uniform manner. In 
particular, the unified framework provides a layout technique which is suitable for custom 
layout, and at the same time is efficient with regard to area, delay, and fault-tolerance. Part I 
consists of Chapters 2 through 5. 

Part II examines the channel routing problem. Algorithms for channel routing form the 
basis of many existing automatic layout systems. Although this problem has received wide 
attention over the last decade and a number of heuristic algorithms have been proposed, none 
of these is guaranteed to always determine efficient, routings. Approaching this problem from 
a theoretical viewpoint, we characterize completely the properties that make channel routing 
difficult. Moreover, vvc provide a novel, linear-time algorithm that is always guaranteed to find 
near-optimal solutions. Chapters 6 and 7 constitute Part II of this thesis. 
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Although the two parts of the thesis investigate different problems, they share a common 
underlying philosophy. We begin with a theoretical characterization of the properties that make 
the problems difficult. In the next step, algorithmic techniques are developed for exploiting 
these properties to solve the problems. Although the results in their present form are primarily 
theoretical in nature, the techniques provide new insights and approaches for VLSI layout. It 
is likely that some of the techniques can be adapted for use in practice. 

The remainder of this chapter discusses the two parts of the thesis in more detail, and 
concludes with an outline of the thesis. 

1.1. The Complexity of VLSI Graph Layout 

In recent years a number of interconnection networks have been proposed for solving diverse 
problems. For example, one- and two-dimensional arrays of processors are naturally suited to 
vector and matrix computations [50]. Binary trees arc particularly attractive because of their 
logarithmic depth and have been proposed for a variety of applications including raster graphics 
[27], databases [75], and direct execution of applicative programming languages [54]. The 
mesh of trees [19, 44, 57] combines arrays and trees in an elegant manner. By virtue of their 
sophisticated structure, networks such as the shuffle-exchange network [73], cube-connected 
cycles network [63], and fast-fouricr transform network [76], in which recursive algorithms 
are programmed conveniently in a natural manner, are computationally more versatile and 
powerful than the simpler array structures. 

Can we exploit the power of sophisticated networks in VLSI? This question becomes 
increasingly important as problem sizes, and the number of processors increase. It might 
be relatively simple to fit a thousand processor array on one chip, but can we fit a thousand 
processor shuflle-exchange network on one chip? Moreover, even if the shuffle-exchange network 
fits, will its performance, determined by the clockperiod or longest delay, be comparable to the 
array? To answer such questions, and to compare the relative merits of different networks, it 
is necessary to develop a general theory for VLSI graph layout. 

Research in layout theory was initiated by Thompson [79, 80] who proposed a formal model 



T1IIC COMPLEXITY OF Gtf.APli LAYOUT 9 

for VLSI graph layout and investigated area-time tradeoffs for computing certain functions. 
Using information-transfer arguments, he obtained strong lower bounds on the layout areas of 
graphs such as the shuffle-exchange and cube-connected cycles graphs. Subsequently, Leiscrson 
[49, 50] and Valiant [83], focussing on the problem of minimizing layout area, independently 
developed a divide-and-conquer layout strategy for general classes of graphs. Using elegant 
combinatorial arguments, Leighton [40, 41] showed that the bounds of Leiscrson and Valiant 
were the best possible in that each class contained graphs for which the bounds were, upto 
constant factors, optimal. For some graphs however, the bounds were very weak. 

Layout area is not the only consideration in choosing one layout over a multitude of 
possible layouts. In practice, we desire to fabricate small, inexpensive, and easily testable chips 
which compute quickly and reliably. A large number of important engineering issues need to 
be considered in fulfilling these (possibly conflicting) requirements. 

Propagation delays across long wires critically affect the performance of a circuit layout. In 
pipelined or systolic systems, long delays determine the clockperiod and overall performance of 
the system. Since propagation delay can be reduced by decreasing wire length, it is important to 
make the longest wire in the layout as short as possible. Another way to reduce the propagation 
delaj' across a long wire is by increasing the size of the transistor that drives the wire; by 
carefully adjusting transistor sizes to match wire lengths, the clockperiod can be dramatically 
reduced. Since wire delays determine the efficiency of a chip, it is imperative that techniques 
to minimize delay be developed within a general theory for VLSI layout. 

Fault tolerance is another important design consideration. Fabrication processes are prone 
to errors so that every wafer invariably contains a small number of defects. Even if a wafer 
contains a number of defective processors, it may still be possible to use the wafer by configuring 
wires around the defective processors. This may, for example, be performed by laser restruc- 
turing techniques [64]. This ability to wire together processors selectively has considerable 
impact on sytcm design. For example, how should a thousand processor wafer be designed so 
that a two-dimensional array can be realized using all the good processors, no matter how the 
defective processors are distributed? 

Another major concern is the problem of assembling large systems. Researchers have 
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proposed networks with as many as one million processing elements [54]. Such systems are 
clearly too large to fit on a single chip. Whenever any system is larger than a single chip, 
it is necessary to partition the system among several chips which can be assembled at, the 
printed circuit (or chip carrier) level. What is the most effective way to partition a large 
system among several chips? This question is pressing because although fabrication technology 
has been advancing at a rapid pace, the technology for packaging chips has been crawling in 
comparison: current projections indicate as many as one hundred million components per chip 
but not more than two hundred off-chip pin connections. 

The economics of fabrication technology dictates that it is expensive to make one chip, 
but cheap to make many copies. For this reason, manufacturers of custom chips have been 
encouraged to make configurable designs such as gate-arrays, ROM's, and PLA's. The entire 
chip is manufactured, except for one mask. Given a desired configuration of the chip, a 
final layer of metallization connects up the circuitry in that way. Most of the design and 
fabrication costs are thus factored over several chips. Similarly, restructuring techniques allow 
a chip to be modified after fabrication. For example, "diode-busting" is used to configure 
PROM's (programmable read only memory) after fabrication. More recent and exciting is the 
prospect of "laser welding" by which connections between wires can be either made or broken 
after fabrication by high-intensity laser beams. Such techniques further encourage configurable 
design of VLSI chips. Thus, we are led to consider how to design efficient layouts which may 
be configured to realize, for example, arbitrary binary trees or arbitrary rectangular arrays. 

Motivated by the engineering issues outlined above, Part I develops a general framework for 
VLSI graph layout. Within this framework all the diverse concerns mentioned above are dealt 
with in an efficient and uniform manner. The framework is based on a divide-and-conquer 
strategy for graph layout which differs significantly from the divide-and-conquer strategy of 
Leiserson [49, 50] and Valiant [83]. The improved strategy is based on the notion of graph 
bifurcators introduced by Leighton [42], and provides universally close bounds on important 
cost functions such as layout area and propagation delay. The results of Part I are based on 
the papers of Bhatt and Leiserson [8, 9], and Leighton [42]. In addition, the results of Chapters 
4 and 5 appear in [7]. 
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1.2. The Complexity of Channel Routing 

Although the graph layout problems considered in Part I provide new insights and paradigms 
Tor VLSI layout, they are nonetheless abstractions of layout problems encountered in practice. 
Part II focuses on a specific problem confronting current automatic layout systems. 

Channel routing plays a central role in automated layout systems. Most layout systems 
proceed by first placing modules on a chip, and then wiring together terminals on different 
modules that should be electrically connected. To solve the latter wiring problem, the chip 
is heuristically partitioned into a set of rectangular channels, and each channel is assigned a 
set of wires which are to pass through it. This effectively reduces a difficult "global" wiring 
problem to a set of disjoint (and presumably easier), "local" channel routing subproblems. 

An instance of the channel routing problem is specified by a set of terminals located at 
ilxt'il positions on two horizontal tracks. Kach set of terminals with the same label constitutes 
a net which must be electrically connected by wires running in horizontal tracks and vertical 
columns. Figure 1.1 shows a channel with six nets. Horizontal and vertical wire segments are 
placed on two different layers of interconnect. The objective is to wire up all nets in a way 
that minimizes the channel width, which is the number of horizontal tracks used for wiring. 
For example, Figure 1.2 shows a minimum width wiring of the channel in Figure 1.1. 

The channel routing problem has been intensively studied for over a decade, and many 
heuristic algorithms have been proposed for solving the problem [l, 2, 11, 12, 18, 20, 21, 34, 35, 
36, 38, 51, 60, 62, 67, 68, 81, 81]. Recently, Szymanski [77] showed that the general problem is 
NP complete, and with Yannakakis [78] showed that the problem is NP complete even when 
every wire connects exactly two terminals. This might explain why the fast heuristic algorithms 
developed thus far cither produce arbitrarily bad solutions in many cases and/or completely 
fail on other instances. 

Part II of the thesis presents a linear-time algorithm which always produces a near-optimal 
solution. This algorithm is based on the key notion of channel flux which is introduced in 
Chapter 7. The algorithm originally appears in a paper by Baker, Bhalt, and Leigh ton [3]. 
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Figure 1.1: A channel with six nets. 



I i 
i i 



-4 — t- 



6 2 3 I k S 

"t T - T T"~ — T — 



4- ' 6 3 2. V- 5 5 
Figure 1.2: A minimum width routing. 



1.3. Overview 

The next four chapters are devoted to VLSI graph layout, and form Part I of the thesis. 
Chapter 2 outlines Thompson's model for VLSI layout, reviews previous research, and describes 
important layout problems in a formal setting. Chapter 3 focuses on layouts for the simplest 
of networks: binary trees. In addition to presenting new layouts with improved bounds on edge 
lengths, the complexity of producing optimal layouts is examined. The new layout strategy 
motivates the paradigm for general graph layout presented in Chapter 4. Finally, Chapter 
5 shows how the new layout paradigm can be used to efficiently solve the important layout 
problems of Chapter 2. 

Part II of the thesis consists of Chapters 6 and 7. Chapter 6 describes the channel routing 
problem, its use in automatic layout systems, and briefly reviews previous research. Chapter 7 
introduces the concept of channel flux and presents a linear-time approximation algorithm for 
Manhattan routing. 

In conclusion, Chapter 8 summarizes the major results of both parts and outlines a number 
of important, unresolved problems. 



CHAPTER 2 



Issues in VLSI Graph Layout 



The first three sections of this chapter introduce the layout model developed by Thompson 
[79, 80] and briefly review previous research in VLSI graph layout. In particular, we discuss the 
layout strategy of Leiserson [49] and Valiant [83] and note that bounds on layout area based on 
separator theorems can be very different from the actual minimum layout area. The remainder 
of this chapter is devoted to formalizing a number of layout questions motivated by engineering 
considerations. 



2.1. The Layout Model 

In order to cast VLSI layout problems within a mathematical framework, Thompson [79, 
80] developed a formal model for VLSI graph layout. The model is based on, and is consistent 
with, the VLSI design rules established by Mead and Conway [55]. It is also similar to the 
widely used Manhattan wiring model. In the Thomspon grid model, a layout for a graph is 
characterized as an embedding within a two-dimensional grid. A two-dimensional grid is a 
collection of horizontal and vertical tracks spaced apart at unit intervals. A layout for a graph 
G is specified by an embedding which assigns nodes of G to points in the grid where horizontal 
and vertical tracks intersect, together with an (incidence-preserving) assignment of the edges 
of G to paths in the grid. The paths of the layout are restricted to follow along grid tracks 
and are not allowed to overlap for any distance (although a vertical path segment may cross 
a horizontal path segment). In addition, the paths may not cross nodes to which they are not 
adjacent. For obvious reasons, we restrict our attention to graphs in which no node has degree 
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Figure 2.1: A layout for K 4 . 

greater than four. As an example, Figure 2.1 shows a layout, for the complete graph on four 
nodes. 

Remark, The results of this thesis extend to variants and generalizations of the Thomspon 
grid model. For example, graphs with bounded valence greater than four may be laid out by 
mapping each node to a region of the grid, instead of a single grid point. The results are also 
applicable to networks with large processors. Techniques for dealing with large processors are 
described more fully in Chapter 5. 



2.2. Elementary Bounds on Layout Area 

Although there are a variety of important engineering considerations in choosing one layout 
for a graph over other possible layouts, the best understood, and perhaps the most desirable 
cost measure to minimize is layout area. The area of a layout is most naturally defined as 
the area of the "bounding-box" around the layout, and equals the product of the number of 
vertical tracks and the number of horizontal tracks that contain a node or wire segment of the 
graph. For example, the layout of Figure 2.1 has area 15. This is not the minimum possible; 
there is another layout with area 9. 

How much area does an N-node graph require? Clearly, the area cannot be less than 
the number N of nodes. On the other hand, by embedding nodes at equally spaced intervals 
along a line, and using a distinct horizontal track for each edge (as shown in Figure 2.2), it is 
clear that the area required for an iV-node graph is no greater than 0(N 2 ). These bounds are 
independent of the structure of the graph and hold for all iV-node graphs. In general, however, 
the minimum area needed to lay out a graph depends on the graph. 
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Figure 2.2: Every N -node graph can be laid out in 0(N 2 ) 
area. 

Thompson [79, 80] identified bisection width as an important property of graphs that affects 
minimum layout area. The bisection width of an N-node graph is the minimum number of 
edges which must be removed from the graph in order to disconnect it into two subgraphs 
each of size at least [A//2J. Thompson showed that, up to a constant factor, the layout area 
can be no less than the square of the bisection width. Therefore, if the bisection width for 
a graph is known, a lower bound on area can be easily computed. By showing that certain 
computationally powerful graphs such as the shuffle-exchange graph have large bisection width, 
Thompson showed that these graphs require large area. In fact, Thompson extended this 
observation to obtain area-time tradeoffs for computing certain functions. 

Leighton [40, 41] identified crossing number as another general property that affects layout 
area. The crossing number of a graph is defined as the minimum number of edge crossings in 
any drawing of the graph in the plane. It is easy to see that the crossing number of a graph is a 
lower bound on layout area. Using more sophisticated arguments for special graphs, Leighton 
also directly obtained lower bounds on total wire length (the sum of the lengths of the wires 
in a layout), which of course is a lower bound on layout area. These techniques are heavily 
dependent on the recursive structure of the special graphs and are generalized in [7]. 



2.3. Layouts Based on Separator Theorems 

Leiserson [49, 50] and Valiant [83] investigated general properties that provide effective 
upper bounds on layout area. They independently developed a divide-and-conquer strategy for 
graph layout and showed, for example, that every iV-node tree can be laid out in O(N) area 
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and that every TV-node planar graph can be laid out in 0(TVlg 2 TV) area. Their technique is 
based on the notion of separator theorems for graphs. 

Definition: A class of graphs which is closed under the subgraph relation is said to have 
an f[x)-separator theorem if there exist constants a and b where < a < 1/2 and b > 
such that every TV-node graph in the class can be partitioned (by the removal of at most 
bf(N) edges of the graph) into disjoint subgraphs having a'N and (1 - a')N nodes where 
a < a' < 1 — a. 

Given a class of graphs for which a separator theorem is known (e.g., trees have a 1- 
separator theorem [52] and planar graphs have a -^-separator theorem [53]), it is possible to 
construct a layout for any TV-node graph in the class by using a simple divide-and-conquer 
approach. For example, Leiserson [49, 50] proved the following upper bounds on layout area. 

x a -separator theorem Layout Area 

a<l/2 0{N) 

a = 1/2 0(7Vlg 2 /V) 

a > 1/2 0{N 2a ) 

Remark. The layout procedure assumes that a complete recursive decomposition of the graph 
is given. If a complete decomposition is not given, then there is no known polynomial time 
algorithm which achieves the upper bounds on area. This severely limits the applicability of 
separator-based layout strategics to classes of graphs (such as trees or planar graphs) for which 
decompositions are easily computed. 

How good are the preceding area bounds? Thompson [79, 80] and Leighton [40, 41] showed 
that none of the bounds can be improved. More precisely, they showed that within each class 
there is a graph for which the bound is optimal. But this docs not mean that the bounds are 
optimal for every graph within a class. In fact, while the bounds are existential ly optimal, 
they are not universally optimal. For example, an TV-node square grid can be laid out in area 
linear in TV, but since the minimum separator theorem for the class of square grids is >/x, the 
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best bound obtainable by separator-based layouts is (){N\g 2 N), which is off by a factor of 
0(lg 2 N) from the optimal. Of course, since TV-node graphs require area at least N, the bounds 
for graphs with x Q -separator theorems, a < 1/2, are asymptotically universally optimal. 

For graphs with larger separator theorems, the discrepancy between the minimum layout 
area and that given in the table can be much worse. Consider, for example, the Af-node graph 
S N which consists of N/\gN disjoint lg A^-node expander graphs. An m-node expander graph 
has the property that every subset of k nodes is linked by 8(min(A;, m - k)) edges to the m - k 
nodes outside the subset.* The bisection width of such a graph is fi(m), and hence the minimum 
separator theorem is Q{x). The existence of trivalent graphs that satisfy this defintion has been 
known for a long time [28, 31]. In fact, almost all trivalent graphs satisfy this definition. Since 
each lgAT-node expander graph can be trivially laid out in 0(lg 2 N) area, the layout area of 
S N is no greater than 0{N\gN). However, Leighton [-12] showed that the minimum separator 
theorem for the class of graphs S N exceeds H(x/lg 2 x), so that the area bound from the table 
above is 0(N 2 / \g 4 N), which is much worse than the optimal bound of 0{N\gN). 

Remark. Any class of graphs closed under the subgraph relation and containing S/v must 
also contain expander graphs. Hence, the minimum separator theorem (as defined earlier) for 
the class is 0(z). Instead of defining separator theorems for classes of graphs closed under the 
subgraph relation, it is more convenient (and general) to define separators for individual graphs 
in terms of the subgraphs produced by its recursive decomposition. Using the less restrictive 
(but more useful) definition, it is possible to show that S N has an 0{N / lgiV)-separator. The 
lg TV-node expander graphs are split in the upper levels of the decomposition and never appear 
intact as subgraphs in the lower levels of the decomposition. Leighton [42] proved that even 
using the most liberal definition, the minimum separator for S N is at least Cl{N/\g N). Any 
bound on layout area for S N based on the minimum separator can therefore be no less than 
U{N 2 /\g 4 N). 

Thus, while the divide-and-conqucr strategy based on separator theorems gives exisfcntially 



*Th<: original definition of expander graphs is slightly different from that given here. We adopt this minor 
variant because it allows nodes of degree no greater than three. 
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optimal bounds, the bounds can be unacceptably poor in a universal sense. It was the discovery 
of such large discrepancies that led to the search for an alternative framework for VLSI layout. 
Within the new framework presented in Chapter A we shall see how these large discrepancies 
are overcome. 

2.4. Eight VLSI Graph Layout Problems 

As mentioned earlier, there arc many important considerations in choosing one layout over 
a multitude of other possible layouts. The problems in this section are motivated by some 
engineering concerns fundamental to circuit design and layout. Though not exhaustive, this 
list covers most of the theoretical issues studied recently. Many of the problems are known 
to be NP-Complete. The emphasis throughout this thesis is the development of a general 
unifying framework for dealing with diverse issues in a uniform manner. Within the framework, 
solutions to some problems are reasonably close to optimal. For other problems, good heuristics 
are developed or suggested, and general bounds obtained. 

Problem 1. Given a graph G, produce an area- efficient layout for G. 

As mentioned before, minimizing area is a critical concern in VLSI circuit layout. In 
addition to the work on arca-efiicient layouts described in the previous section, Dolev, Leighton, 
and Trickey [22] have shown that determining the minimum layout area of a forest of trees is 
NP-Complete. 

Problem 2. Given a graph G, produce an area-efficient layout for G with minimax edge 
length. 

Besides area, speed is another critical factor in chip performance. Signals do not propagate 
instantaneously across wires, and the longer the wire, the longer the propagation delay. In 
pipelined or systolic systems, the effect of propagation delays is even more dramatic. The 
maximum delay determines the clockperiod, and hence the throughput, of the system. To 
maximize throughput we need to minimize the maximum delay. In short, we must produce 
layouts so that the longest edge is as short as possible. The minimum, over all layouts, of the 
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length of the longest edge is called the minimax edge length. 

Paterson, Ruzzo and Snyder [59] studied the problem of minimizing edge lengths for 
complete binary trees. They showed that the minimax edge length of an TV-node complete 
binary tree is Q(\/~N/\gN). Adopting a different strategy based on separator theorems, the 
next chapter presents a general technique for bounding the maximum edge length of arbitrary 
trees, while Chapters 4 and 5 extend the techniques to general graphs. The next chapter also 
shows that minimizing the edge lengths of trees is NP-complete. 

Problem 3. Given a graph, produce an area- efficient layout in which each wire ha3 
bounded delay in the capacitive model. 

Although it is certainly true that propagation delay across a wire depends on the length 
of the wire, there has been little consensus on how fast propagation delay grows as a function 
of wire length. Thompson [70. SO] assumes propagation delay to be ronct-nnf,. inrlonnndent of 
wire length. This might seem unreasonable given the ultimate specd-of-light limitation which 
indicates that the delay increases linearly with length. The speed-of-light limitation, however, 
greatly exaggerates the importance of wire delay in determining the speed of circuits. Mead 
and Conway [55] take into account some of the electrical characteristics of interconnections on 
MOS integrated circuits, and emphasize the role of wire capacitance in determining propagation 
delay. Recent analysis by Bilardi, Pracchi, and Preparata [10] strongly supports the belief that 
capacitive effects play the predominant role in determining the speed of MOS circuits. 

In a capacitive model, each wire is assumed to present a purely capacitive load to the 
transistor that drives a signal across the wire. This load is proportional to the length of the 
wire plus the area of the transistor that receives the signal. The delay is proportional to 
this load divided by the area of the driving transistor. By increasing the size of the driving 
transistor it is therefore possible to bound the propagation delay, independent of the length of 
the wire. A second well-known technique for reducing delay across a long wire is to "ramp" 
the wire with a geometrically increasing series of inverters [55]. The number of intermediate 
drivers, and hence the delay, is logarithmic in the length of the wire, but an attractive feature 
is that this process can be carried out without the need to resize the original transistors in the 
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circuit. 

Of course, increasing the size of one transistor or introducing new transistors might force 
some wires to be stretched to avoid the enlarged transistor area. In other words, decreasing 
the delay across one wire might force an increase in delay over other wires. Lciscrson [47] and 
Mchlhorn [56] independently posed the question or whether or not the transistors in a layout 
could be resized so that every wire in the layout has constant propagation delay. Ramachandran 
[65] investigated the problem of introducing intermediate drivers along long wires to decrease 
delays, but under the constraint that the topology of the layout remain unchanged. With the 
restriction that wires can not be rerouted, she showed that logarithmic delay can be achieved, 
but at the expense of squaring the layout area in the worst case. We allow the layout topology 
to be changed, and obtain significantly better results. 

Problem 4. Given a graph G, produce a layout for G with few wire crossings. , 

An undesirable feature of iayouts is the presence of a large number of wire crossings. 
When two wires cross, they must be on different layers. For faster operation, and less power 
dissipation, it is advantageous to maximize the total amount of wiring on a layer of low 
resistance, e.g. the metal layer, while minimizing the wiring on a layer of high resistance, 
e.g. the polysilicon layer. The net wiring on one layer may be reduced by laying wires on that 
layer only just before and after two wires cross. If the number of wire crossings is small, the 
number of contact-cuts which connect wire segments on different layers is small so that the area 
of the layout is not blown up by the contact cuts which occupy large area. In addition, long 
wires that are crossed by many other wires are susceptible to cross-talk when all the crossing 
wires simultaneously carry the same signal. 

The crossing number of a graph is defined to be the minimum number of wire crossings in 
any drawing of the graph on the plane. Leighton [40, 41] proved upper and lower bounds on 
crossing numbers and then used the results to find bounds on layout area. Garcy and Johnson 
[29] showed that determining the crossing number of bipartite graphs is NP-Complete. 



EIGHT VLSI CltAPH LAYOUT PROBLEMS 21 

Problem 5. Given a graph, produce an area- efficient regular layout for the graph. 

Some design methodologies, most notably gate-arrays, require that processors be located 
at fixed positions on a chip. In gate-arrays the processors are placed in a grid pattern with 
uniform spacing between processors adjacent along every row and column. Such layouts are 
said to be regular. An important advantage of this design restriction is its flexibility: even if 
the size of every processor is increased, the wiring between processors remains unaffected and 
the total area remains proportional to the sum of the wire area (as computed with unit-size 
processors) and the processor area. This is because only the \HV rows and columns containing 
the iV unit-size processors need to be expanded to accomodate the non-unit-size processors. In 
non-regular layouts, every row and column might have to be expanded since there might be a 
node in every row and in every column. Increasing the linear dimension of the processors by a 
factor of s could result in an ©(a 2 ) increase in layout area. 

Previous divide-and-conqucr layout strategies do not produce regular layouts. Hence, they 
are not useful in laying out circuits with non-unit-size processors. A good strategy for producing 
regular layouts would solve the nagging problem of how to cope with variable-size processors. 

Problem 6. Design area- efficient chips that can be configured to realize a large number 
of graphs. 

Because it is expensive to make one chip but cheap to make many copies, manufacturers of 
custom chips have been encouraged to make configurable designs such as gate-arrays, ROM's 
and PLA's. In such designs, the entire chip is prefabricated except for one layer. The customer 
then specifies a configuration for the chip, and the final layer of metalization connects up 
the circuitry in that particular way. Hence, most of the design and fabrication costs can be 
factored over many custom chips. Similarly, the fast emerging laser-restructuring technology 
[64] provides another economical way to customize chips after fabrication is complete. Laser 
restructuring allows connections between wires to be made or broken after the chip has been 
fabricated. In either case, it is desirable to design layouts that can be configured from one of 
a few basic patterns. 
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Problem 7. On a wafer which has arbitrarily distributed defective cells, realize a given 
graph on the good cells. 

In any fabrication process, it is expected that some of the processing cells will be defective. 
In a two-dimensional array of cells on a wafer in which defective cells are arbitrarily distributed, 
it may still be possible to use the wafer by configuring wires around the defective cells. This 
may, for example, be performed by laser restructuring techniques [64]. Given this ability to 
isolate defective cells, it is important to consider how a graph may be realized on the remaining 
good cells. This problem has received considerable attention recently [33, 45, 69]. The problem 
is similar to the general graph layout problem in the Thompson model but with the important 
restriction that nodes of the circuit can only be mapped to a restricted set of nodes in the grid. 

Problem 8. Given a graph G, assemble G using the minimum number of copies of a 
single chip having few external pin connections. 

A number of very large networks have been proposed in recent years for implementing 
priority queues [48], for searching [5], for direct execution of applicative programming languages 
[54], and for recognizing regular expresions [26]. Some of these networks arc too large to fit 
on a single chip. For example, the tree-structured network of [54] is envisioned to contain 
as many as one million processing elements. Clearly, such networks must be partitioned over 
many interconnected chips, so that each chip realizes a small portion of the network. 

The technology for packaging chips severely limits the number of external pin connections 
on a chip. While chips with over a million components are forseeablc in the near future, no one 
predicts a chip with over two hundred external pin connections. This poses a pressing problem 
in assembling large networks of processors. 

Even if a network could be partitioned so that each portion has only a few external 
connections, it would be economically infcasible to design each chip individually. For instance, 
it would be prohibitively expensive to design one thousand different chips, each containing a 
thousand processing elements, to assemble a network of one million processors. For this reason, 
it is necessary to assemble large systems using copies of a few configurable or restrucfurable 
chips. The next chapter presents one solution to the problem of assembling large tree structures 
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using copies of a single, area-efficient, restructurable chip with few external pin connections. 

Within the new framework, efficient solutions are provided for each of these problems. In 
fact, a single layout simultaneously solves many of these problems efficiently. The framework 
provides a two-step strategy for solving these problems, first, the graph to be laid out is 
embedded within a very special network called the tree of meshes. For the tree of meshes it is 
possible to solve all these problems efficiently. In the second step, therefore, a good layout for 
the tree of meshes also solves these problems for the embedded graph. 



CHAPTER 3 



Layouts for Trees 



A binary tree may not be the best multiprocessor organization, but it has been proposed by 
many researchers for a variety of reasons. For example, a complete binary tree can be the major 
component of a priority queue resource [48] and of a smart-memory raster graphics system [27]. 
A complete binary tree can also serve as a hardware structure for searching [5], for databases 
[75], or for direct execution of applicative programming languages [54]. Hrowning [15] proposes 
a complete binary tree for general-purpose multiprocessing, and two systems based on her ideas 
are being built at Caltech and Bell Laboratories. 

Attention is also directed to binary trees which are not complete. Floyd and Ullman [15] 
show that strings described by a regular expression can be recognized by processing elements 
organized as the parse tree of the regular expression. Foster and Kuiig [25] have a similar 
scheme based on the simple configurable layout developed by Lciserson [50]. There are other 
proposals, for example [58, 74], of machine organizations that, while not trees, are nevertheless 
tree- like. 

We shall not debate the merits of the various tree machines here, but shall confine ourselves 
to understanding their physical organization. In this regard trees are particularly attractive 
because of their simple interconnection structure. Not only can trees be laid out efficiently, but 
good layouts for trees also suggest efficient ways to lay out general graphs. Moreover, problems 
that arc intractable for trees are also intractable in general. Thus, by investigating layouts for 
trees we stand to learn more about general graph layout. 

In the following section we examine two well-known layouts for complete binary trees and 
present a better layout, which minimizes (asymptotically) both area as well as maximum edge 
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Figure 3.1: An ()(n Ign) ar( a layout of a complete binary 
tret. 

length. These bounds are extended to arbitrarily structured trees in Section 3.2, and to planar 
layouts for trees in Section 3.3. Computing the minimum edge length exactly is shown to be 
NP-complete in Section 3.4. Section 3.5 describes Leiserson's [50] assembly of large complete 
trees using multiple copies of a single chip with only four external pin connections. Section 
3.6 introduces and examines the two-color bisection problem for arbitrary trees. Section 3.7 
presents one way to assemble large arbitrarily structured trees using the minimum number of 
copies of a single restructurable chip with few pins. 



3.1. Layouts for Complete Binary Trees 

In addition to their usefulness in speeding up computation time by allowing both paral- 
lelism and pipelining, complete binary trees are attractive also because they can be laid out 
efficiently. Figure 3.1 shows the naive layout of a complete binary tree. Since the height of an 
TV-leaf tree is Ig TV, and the TV leaves are spread out over a line of length 2TV, it follows that 
the area of the layout is 27V lg TV. Furthermore, the longest edges are at the top level and their 
length is £7V. 

The familiar H-tree layout in Figure 3.2 was originally proposed by Mead and Rem [55]. 
In contrast to the naive layout which, in a sense is one-dimensional, this layout exploits both 
dimensions symmetrically. If S(N) is the side of the layout, then we have that 5(1) = 1 and 
more generally, 

6'(7V)=2<,'(7V/4) + l, 

which yields S[N) = 2\/7V — 1. Consequently, the area of the layout is no greater than 47V. 
The longest edges are again at the top level, and their length is no more than l^/N- 



2b I.AYOT'IS K)K I K i-;i.s 



HH 




Figure 3.2: The Il-tree layout of a complete binary tree. 



The H-tree layout asymptotically minimizes area but not maximum edge length. Paterson, 
Ruzzo, and Snyder [59] demonstrated a linear-area layout with maximum edge length 
0[y/N/\gN). In any layout there are two nodes which are distance vTv' apart: moreover, these 
two nodes are connected by a path containing no more than 21g A r tree edges. It follows then 
that at least one of these edges must have length at least \C\ /2 lg .V. Thus, the layout of [59] 
asymptotically minimizes area as well as maximum edge length. Unfortunately, however, the 
layout technique of [59] does not extend to more general graphs. The remainder of this section 
demonstrates another layout with asymptotically optimal area and maximum edge length. The 
following section generalizes our technique to arbitrary trees and, the next chapter to general 
graphs. 

To illustrate our technique, consider the layout of Figure 3.3 in which the nodes at the 
second and third levels of the tree have been brought closer to the root so that all edges within 
the top four levels are of equal length. This "averaging" of edge lengths reduces the maximum 
edge length from ^\/N to f^\/N. Of course, the layout is stretched in the middle in order to 
accomodate two edges instead of one. This increases the area of the layout, but only slightly, 
from AN to AN + 6\Z7v\ 

This averaging operation can be carried out further down the tree so that many levels 
are brought closer towards the root. In order to space top levels of the tree closely together, 
we embed these levels within an H- channel structure shown in Figure 3.4. This structure is 
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Figure 3.3: The H-trer layout with shorter edges at the top 
levels. 
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Figure 3.4: The H-channel structure. 

obtained by taking the H-tree layout of a complete binary tree and blowing up the layout in 
both dimensions by a suitable factor. The details of the embedding are described next. 



Theorem 3.1. An N -node complete binary tree can be embedded in linear area with maximum 
edge length 0{\/N/\gN). 

Proof. To layout a complete binary tree with N leaves, start with the H-tree layout of 
a complete binary tree with lg N leaves which has area 4 lg 2 N and maximum edge length 
5 IgN. Blow up this layout, in cither dimension by a factor ay/~N/\gN, where q is a constant 
specified later. The area of the layout becomes 4a 2 N and the longest channel has length 
laVN. 



28 LAYOUTS FOR TRKES 

Next, lay the root at the centre of the H-channel structure and place the second level nodes 
at distance /3\/N/ lg N from the root on either side. Once again, /3 is a constant specified later. 
Place lower levels of the tree as shown in Figure 3.4, with successive levels spaced equally apart. 
At every corner of the H-channel structure, bisect the tree so that the subtrees embedded 
within the two substructures arc of equal size. Finally, in the lowest level channels lay out the 
remaining subtrees in the II-trce manner. 

We must ensure that every channel is wide enough to accomodate all the nodes in any 
level embedded within it, and also that the II-trce layouts in the final step fit within the lowest 
level channels. To satisfy these conditions, let us first calculate the total number of tree levels 
embedded in all but the lowest level channels. The total length of all channels encountered 
from the centre of the layout to the end of a terminal channel does not exceed the quantity 
2a\/~N. Since the distance between successive tree levels is /3\/N/\gN, the number of tree 
levels embedded is bounded by (2a//3)lgN. The total number of tree nodes within any one 
of these levels is therefore no greater than N 2a/f3 . If 2a/ (1 < 1/2 then the number' of nodes 
in any level is asymptotically less than the width of a channel which equals j3\fN / lg N. The 
first condition is therefore satisfied by having a < /9/4. 

To ensure that the II-trce layouts at the final step fit within the final channel, it suffices 
to check that the dimensions of the layout are smaller than the dimensions of the channel. 
The size of a subtree embedded within a final-level channel cannot be more than N/ lg N 
because the tree is split into half at each corner. The side of the II- tree layout is no greater 
than 2\/7V/ lg AT. By choosing a > 2, the side of the channel is guaranteed to be larger than 
a side of the II-trce layout. Therefore, by choosing a > 2 and /3 > 4a, we see that the layout 
can be completed. Finally, the area is linear in N and the maximum edge length is bounded 
by 0{%fN/ lg N). I 



3.2. Layouts for Arbitrary Binary Trees 

One property of complete binary trees crucial to the layout of Theorem 3.1 is that a 
complete binary tree can be bisected into two equal size subtrees simply by removing the root. 
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At every corner in the Il-channel structure, a forest of complete trees is bisected into two equal 
halves, each "growing" in opposite directions. This controls the size of every subgraph at the 
final level so that a standard layout fits within a final-level channel. 

Arbitrarily structured binary trees arc only slightly harder to bisect. Any A/- node binary 
tree can be separated into two components, each with no more than [§NJ + 1 nodes, by 
removing a single edge [52]. (The worst-case occurs for the four-node tree in which one node is 
adjacent to three others.) Either of the two components might be a forest, but the same result 
applies to forests, so that the binary tree can be split recursively. By recursively splitting the 
larger component, a tree can be bisected by cutting at most O(lgiV) edges, or by removing 
the nodes incident to these edges. The 0[lgN) bound follows because the subgraphs decrease 
geometrically in size with each cut. 

The property that all trees have small bisections was used by Leiserson [49, 50] and Valiant 
[83] to show that all trees have linear-area layouts. We strengthen this result to show that the 
maximum edge length of any TV-node tree is bounded by 0[\/N/\gN). The details of the 
layout are described in the following Theorem. 

Theorem 3.3. Every N-node tree can be embedded in linear area with maximum edge 
length 0{VN/lgN). 



Proof. As before, begin with the II- tree layout of a complete binary tree with lg iV leaves, 
and blow up the layout in either dimension by a factor a\/~N/\gN, where a is a constant 
specified later. The area of the layout becomes Aa 2 N and the longest channel has length 

Find a set of 0{\gN) nodes which bisect the tree and locate them at the center of the 
layout. Place nodes of the tree in breadth-first levels starting with the bisector set as the roots 
of the search, so that consecutive levels are distance P\HV/\g N apart [fi is a constant specified 
later). At every corner of the Il-channel structure, bisect the remaining forest of subtrees so 
that the subforests embedded within the two substructures are of equal size. Add the new 
bisector set to the set of nodes from the previous breadth-first level, as shown in Figure 3.5. 
In the new channel, start with the updated set as the root of a breadth-first search and repeat 
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Figure 3.5: Inserting iu w bisector sets at every corner. 



the procedure used before. Finally, in the lowest level channels lay out the remaining subtrees 
using the standard divide-and-conqucr layout of Leiserson [49, 50] or Valiant [83]. 

As before, we need to ensure that every channel is wide enough to accomodate all the nodes 
embedded within any level, and also that the layouts in the final step fit within the lowest level 
channels. 

Let us first calculate a crude upper bound on the total number of nodes embedded in any 
one breadth-first level. This quantity is certainly less than the total number of nodes embedded 
in all but the final-level channels. To bound the latter quantity, suppose that nodes in each 
bisector set within the H-channel structure are pulled in to the center of the layout, and the 
remaining nodes placed in breadth-first levels until the final-level channels. Bringing all the 
bisector sets towards the center can only increase the number of nodes in all but the final-level 
channels. Since an iV-node tree has a bisector of size O(lgiV), the total number of nodes within 
the union of all bisector sets is bounded by: 



/ZlgigN \ 

°( E 2Mg--j = 0(lg 3 7V). 



The total length of all channels encountered from the centre of the layout to the end 
of a final-level channel docs not exceed 2a>//V. Since the distance between successive tree 
levels is fi\fN / Ig /V, the number of tree levels embedded within the Il-channe! is bounded by 
(2a//3)lgA/. Starting with 0(lg 3 N) nodes as the roots of a breadth-iirst. search, the number 
of nodes encountered in {2a /P) Ig .V levels cannot, exceed 0{ N 2a/e lg 3 ,V). Since every node 
embedded within the Il-channel must be in one such breadth-first level, the previous quantity 
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also bounds the total number of nodes within the 11-channel structure. By choosing 2a//3 < 
1/2, or a < P/A, we sec that the width of a channel asymptotically exceeds the number of 
nodes in any level within the channel. Therefore, the first condition is satisfied by having 
a < PI A. 

To ensure that the layouts at the final step fit within a final-level channel, it suffices to 
check that the dimensions of a layout generated by the Leiserson-Valiant strategy are smaller 
than the dimensions of the channel. Their layout of an z-node tree is linear in x, i.e., bounded 
by 72;, for all x and some constant 7. In the layout described above, the size of a forest 
embedded within a final-level channel cannot be more than TV/ lg 2 TV because the tree is split 
into half at each corner. The side of a layout at the final level is no greater than y/~/N/\gN. 
By choosing a > -^7, the side of the channel is guaranteed to be larger than a side of the H-tree 
layout. Therefore, by choosing a > y/ 7 ) and fl > Aa, we see that the layout can be completed. 
Finally, the area is linear in TV and the maximum edge length is bounded by 0{yNj Ig TV). I 

3.3. Planar Layouts for Trees 

It is sometimes necessary to produce layouts in which distinct edges do not cross one 
another. Planar layouts have the advantage that only one layer of interconnect is required; by 
using a low-resistance metal layer, the resulting circuit is not only faster, but also dissipates less 
power. Many current automatic layout systems reserve a single layer of interconnect for special 
purposes such as, for example, power and ground connections. In such cases, it is necessary to 
find good planar layouts. Needless to say, the underlying connection scheme must be planar. 

Planar layouts may require much more area than non-planar layouts. In particular, Valiant 
[83] demonstrated an TV- node planar graph for which every planar layout occupies at least 
fi(/V 2 ) area and has edges of length fi(TV). On the other hand, Leiserson [49, 50] and Valiant 
[83] showed that every TV-node planar graph can be laid out in 0(TVlg TV) area with edges of 
length 0{\fN \g N) in Thompson's layout model, which allows distinct wires to cross. 

Valiant [83] further showed that every tree has a linear-area planar layout. In other words, 
the planarity restriction does not affect the asymptotic area requirements of trees. Hut what 
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about edge length? Intuitively, the length of a wire can be reduced by taking a short-cut across 
another wire, instead of going around it. So, an important question is whether the planarity 
requirement affects the maximum edge length for trees. 

Although the layout of Section 3.2 has linear area, and asymptotically optimal edge length 
in the worst-case, it is not guaranteed to be planar. However, Ruzzo and Snyder [70] showed 
that this layout could be transformed into a planar layout without increasing edge length 
asymptotically. The details of their transformation are fairly complicated; in the following 
Theorem, we present a simpler transformation. 

Theorem 3.4. Every N-node tree ha.3 a linear- area planar layout with maximum edge 
length 0{\/N/ lg N). 



Proof. The layout proceeds exactly as in the proof of Theorem 3.3, with particular 
attention rmid <<"< the way » hisrWnr «ot;. U chnopn pnd to f ho ordering of nodes within the set. 
In particular, if a forest of x nodes has to be separated from an TV-node tree, x < [N/2\, 
then it suffices to remove at most [lg x] nodes. The key fact is that these nodes can be chosen 
from a single path in the tree. This path induces a natural linear ordering on the set of nodes 
removed. 

To see this, consider a binary tree rooted at a node of degree cither one or two. It is always 
possible to choose such a root, and if the remainder of the tree is drawn in levels then every 
internal node has at most two sons. Label each node in the tree by the size of the subtree 
rooted at that node and below it. Pick any node whose label is no less than x, and both of 
whose sons have labels less than x. Mark this node, [f its label equals x then we have found a 
node whose removal separates a subtree of the required size. Otherwise, one of its sons must 
have a label y > [x/2\, while the other son has label no less than x - y - 1. Recursively 
mark nodes in the subtree rooted at the second son so that the removal of the marked nodes 
separates a forest of size x — y— 1. It is easily seen that the marked nodes lie along a path of the 
original tree. Moreover, the removal of all marked nodes separates a component of size exactly 
x. Finally, since the first node separates a component of size at least [x/2\ + I, it follows that 
no more than [lg x] nodes are marked. Figure 3.6 illustrates this procedure. 
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Figure 3.6: Three cuts separate a subforest of IS nodes. 
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Figure 3.7: The removed nodes are placed in the order 
of occurrence along the path. 



Given a tree, use the above procedure to find a set of nodes which bisect the tree, and lie 
along a path. Place these nodes at the center of the layout in the same order in which they are 
encountered along the common path. Next, find all nodes adjacent to the bisector set and place 
them on either side as before. However, the ordering of nodes in these breadth-first levels is 
chosen as follows: for each pair oT nodes u,v that are placed next to each other in the bisector 
set, if the path connecting them is u, t\, t^, . . ., tk> v, then place nodes t\ and tk next to each 
each other in the second level, as shown in Figure 3.7. The orderings of nodes on either side of 
the center again satisfy the condition that nodes connected by a path in the forest embedded 
on that side appear in the order in which they arc encountered along the common path. 

By placing nodes in every level in the same order in which they lie along a common path 
within the forest still to be embedded, it is easy to guarantee that the layout is planar inside 
the channel (see Figure 3.7). All that remains is to guarantee that the layout can be made 
planar at every corner when new bisector sets are added to a level. 

When the end of a channel is reached, the situation is as shown in Figure 3.8. Nodes 
u\,uv,...,u n arc those in the last level of the channel. The subgraph which remains to be 
embedded is a forest of subtrees. The n nodes can be grouped according to which subtree they 



34 



LAYOUTS FOR TREES 




v\ocAe<> 






51 fl 



it 



Mtedw 
nodes 



Figure 3.8: To bisect a forest of trees, only one tree need 
be separated. 
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Figure 3.9: Nodes in the final level may be connected to 
their subtrees without crossovers. 

belong to, nodes in the same subtree being adjacent within the ordering. To bisect this forest, 
it suffices to split only one of these subtrees: order the subtrees top-down and pick the lowest 
one so that the subforest above it contains at most one-half of all nodes in the forest. Split 
the subtree this node belongs to into two components as required so that the original forest is 
bisected. By laying out the next breadth-first level and the new bisector nodes as in Figure 
3.8, we see that in each of the two lower-level channels the nodes within the same subtree are 
ordered in the order in which they are encountered along a common path. 

Repeating this process further down the II-channel structure, we see that the layout is free 
of wire crossings. To complete the layout, within the final-level channels we use Valiant's [83] 
linear-area planar layouts for each remaining subtree. Edges from these subtrees to nodes in 
the last breadth-first level of the penultimate channel can be inserted without crossovers as 
shown in Figure 3.9. This completes the planar layout. J 
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3.4. The Complexity of Minimizing Edge Lengths 

Thus far wc have only showed that every tree can be laid out with maximum edge length 
bounded by 0(\HV/ lg TV). While this bound is asymptotically optimal for some trees such as 
the complete binary tree, it is way off for others. For example, a two-ended string with every 
node connected only to its immediate neighbors can be trivially laid out with every edge of 
length one, independent of the number of nodes. 

This motivates the problem: it Given a tree, produce a layout with minimax edge length. In 
this section we show that determining the minimax edge length is computationally intractable. 
The results are quite discouraging - even the problem of deciding if a given tree can be laid 
out with all edges of unit length is NP-complete. 

Theorem 3.5. Given a tree T, deciding whether or not T has a layout with unit length 
edges is NP-complete. 

Proof. Observe that the problem is clearly in NP; it is easy to guess a layout and verify 
that no edge has length greater than one. It remains to show that the problem is NP-hard. 

The known NP-complete problem used in the reduction is the NOT-ALL-EQUAL 3CNFSAT 
problem [29, 72] stated below. 

NOT-ALL-EQUAL 3CNFSAT: Given a boolean formula <f> in 3CNF (conjunctive 
normal form with three literals per clause), does there exist a truth assignment which satisfies 
<f) such that each clause contains at least one false literal? 

Given a formula <f> in 3CNF, we construct a graph G with the property that G can be 
laid out with all edges of unit length iff <j> is an instance of NOT-ALL-EQUAL 3CNFSAT, 
i.e., (f> can be satisfied with at least one. false literal per clause. The graph G is constructed 
from elementary components termed "lines" (Figure 3.10). The crucial property of a line is 
its rigidity, meaning that in any layout with unit-length edges, nodes u\,...,u n must be lined 
up cither horizontally or vertically. Figure 3.10 shows how to connect two lines so that the 
resulting graph can be laid out in only two ways (ignoring rotations). 

Let xi,...,x n be the variables, and Ci,...,C m be the clauses of </>. The basic "skeleton" 
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Figure 3.10l4 "rigid" line with exactly one unit-length layout/a.). 
(b) Two rigid linen connected as shown can be laid 
out in exactly two ways. 
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Figure 3.11:77ie skeleton of the transformation. Each column 
represents a variable, while each clause is as- 
sociated with two rows that are mirror images 
with respect to the x-axis. 
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of G is shown in Figure 3.11. For each j, 1 < j < m, the distances (number of intermediate 
nodes) q^ — C ]X , a, — C' ]x are all equal. The line u t — v x corresponds to variable x t , and the two 
ways of embedding it with respect to the A-B axis correspond to assigning i, true or false. 

Thus far, there are 2 n possible ways of laying out G with unit length edges, each correspond- 
ing to a truth assignment to the variables of </>. Next, we encode within G the "structure" of 
<f> as described below. 

Let clause C, be denoted l h V lj 2 V l } „. If l h is positive (z,) add a "striker" at node C Jtii . 
Otherwise, if / jV is negative (i t ) add a striker at node C' Jijr Finally, for every k ^ ji,h,h> 
add strikers both at C,-,* and at C' Jtk . For example, if C x = n Vx 2 \/x 3 , the strikers are added 
as shown in Figure 3.12. 

Think of a node without a striker as a "hole". The rows C, and C' together share three 
holes, and 2n — 3 strikers. Because of the boundary constraints at the sides, no more than 
n — 1 of these strikers may lie on any side of the A — B axis. In other words, for each clause 
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Figure 3. 12:/n any unit-length layout each row contains at 
least one hole; this corresponds to an instance 
oj NAE-SCNFSAT. 
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Figure 3.13^A binary tree which has a unique (upto rota- 
tions) unit-length layout. 

there must be at least one hole on either side of the axis in a unit-length layout. For each 
clause, a hole "above" the axis implies a truth assignment which makes the clause true, while 
a hole "below" the axis implies at least one false literal within the clause. Therefore, there 
is a unit-length layout if and only if the formula is satisfiable with at least one false literal 
per clause. In short, G has a unit-length layout iff <j> is an instance of NOT-ALL-EQUAL 
3CNFSAT. Since the reduction is easily carried out in polynomial time, the theorem follows. 



In the above reduction, many nodes had degree four. We may strengthen the result to 
binary trees with maximum degree 3. A rigid line may be implemented by stringing together 
binary trees as shown in Figure 3.13. It is not hard to show that the structure is rigid; the key 
property is that the complete binary tree on 31 nodes has a unique (upto rotations) unit-length 
layout. This yields the following result. 
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Corollary 3.6. Given a binary tree, deciding whether or not it has a layout with unit- 
length edges is NP-complete. 



3.5. Assembling Complete Trees 

Whenever any system is larger than a single chip, it is necessary to partition it among 
separate chips which can be assembled at the printed circuit (or chip carrier) level. What is 
the most effective way to partition a large binary tree among several chips? 

This question is pressing because although integrated circuit technology has been advancing 
at a rapid pace, the technology for packaging chips has been crawling in comparison. Packaging 
technology severely restricts the number of external connections to an integrated circuit. While 
the number of components per chip is expected to reach one hundred million, no one forsces 
chips with more than two or three external pin connections. 

This section presents Leiserson's scheme [50] for assembling complete binary trees using one 
kind of chip with only four external pin connections. This chip has been used in tree-machine 
projects at Caltech and I Jell Laboratories [16]. We review this scheme here for its simplicity 
and because the general scheme developed in Section 3.7 is based on similar ideas. 

Figure 3.14 shows how arbitrarily large complete binary trees can be built out of a single 
chip that has only four off-chip connections. Each chip contains one internal node of the tree, 
and the remainder of the chip is packed as full as possible with an Il-tree layout. The internal 
node requires three off-chip connections (denoted F, R, and L in the figure) for its father, right 
son, and left son. The M-tree requires only one off-chip connection (denoted T) to its father. 

To interconnect two chips, the unconnected internal node of one of the two chips is selected 
as the father of the two ll-trecs. In Figure 3.14 the internal node on the left has been chosen for 
this purpose. The It pin on this chip is connected to its own T pin, and the L pin is connected 
to the T pin on the other chip. Considered as a unit, the combined two chips now have the 
same structure as a single chip ■-- three connections to an internal node and one to the root 
of a complete binary tree. The pair of chips can be similarly combined with another pair to 
produce a quadruple of chips, which can in turn be combined, and so forth. Figure 3.15 shows 
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Figure 3.15l4 iar^e complete binary tree assembled using 
many copies of the same chip. 

a large complete binary tree which has been wired up in this recursive fashion. 



Unlike the assembly for complete trees, configurable or restructurable designs are required 
for assembling arbitrary binary trees. The reason is simple: a single fixed chip with TV processors 
can realize only one iV-node binary tree. In order to realize every iV-node binary tree, either a 
new mask must be designed for each tree, or else connections on the chip must be restructured 
(for example, by laser) after fabrication. Given the ability to restructure wires on a chip, we 
ask: Is there an area- efficient restructurable chip with N processors and m pins (m < < N) 
which can be used to assemble every binary tree, independent of its size? 
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This question is affirmatively answered in Section 3.7. The solution depends heavily on 
the results of the next section which eoiiMders the problem of partitioning a binary tree into 
subforests of size .Y so that every subl'oresi has at most O(lg.Y) edges connected to nodes in 
other subforests. The solution to this problem leads directly to the restrurturable chip design 
of Section 3.7. 

3.6. Collinear Layouts and Two-color Bisectors 

This section introduces the notion of two-color bisectors for trees. Two-color bisectors 
are a natural extension of graph bisectors, and will be critically used in partitioning graphs 
for layout. In this section we show how to use two-color bisectors to partition an arbitrary 
tree into subforests of size TV so that every subforest has at most 0(lg N) edges connected to 
nodes in other subforests. Bounds on the size of two-color bisectors are obtained from collinear 
layouts developed by Bentley and Leiserson [50]. 

Definition. Suppose that an N -node graph G has b black nodes and w white nodes. A two- 
color bisector for G is a set of edges whose removal bisects G into two subgraphs each of size 
at least [N/2\, and such that each contains at least [b/2\ black and [w/2\ white nodes. 



Theorem 3.7. Every N -node forest of binary trees has a two-color bisector of size no greater 
than 21gJV. 

Proof. Following Bentley and Leiserson [50], construct a collinear layout for the forest 
as follows. By removing one edge, separate the forest into two subforests so that neither 
contains no more than [j/Vj -j~ 1 nodes [52]. If either component contains more than [N/2\ 
nodes, separate it into two smaller components using the one-separator theorem agam. Next, 
recursively construct collinear layouts for each subforest, and place these layouts side-by-side 
along the baseline. Finally, as shown in Figure 3.16, connect the two (or three) subforests by 
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Figure 3.16: T7ie recursive construction of a collinear layout. 

routing the separator edges on distinct vertical tracks and along a common horizontal track. 
(For two components this is trivial since only edge is routed; for three components, place the 
subforest connected to both other subforests in the middle as shown.) For each node there are 
three vertical tracks to accomodate edges incident to that node. 

The height of the layout is determined by a simple recurrence relation. Let h[N) be the 
height ol the layout, so that /i(l) = 0, and in general, 

h{N) < h{[N/2\) + 1 . 

A straightforward calculation yields h(N) < lgiV. 

Thus far we have ignored the coloring on the nodes. Suppose there are b black nodes and 
N — b white nodes. Consider a "window" which overlaps [N /2\ consecutive nodes, and place 
it over the leftmost [N/2\ nodes. If more than [b/2\ black nodes fall within the window, slide 
the window one position to the right. Observe that by sliding the window on position, the 
number of black nodes within the window changes by at most one. Furthermore, by sliding 
the window all the way to the right, less than [b/2\ black nodes would fall within the window. 
Consequently, there must be an intermediate placement of the window (see Figure 3.17) in 
which exactly [6/2J black nodes and exactly [{N — b)/2\ white nodes arc contained within the 
window. (Such a placement can be obtained in linear time.) 

Draw vertical lines through the endpoints of the window in the position obtained above. 
The edges of the forest intersecting these lines form a two-color bisector of the forest. The size 
of this two-color bisector is no more than twice the height of the layout; in other words, the 
size of the two-color bisector is no more than 21g7V. g 
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Figure 3. 17'At some point, a window of size n/2 slid along 
the baseline must contain half the black and half 
the white nodes. 



For our purpose the following variant oT two-color bisectors is appropriate. Suppose each 
node of an TV-node forest is assigned a weight from a bounded set {1,2,..., k} of weights. We 
wish to bisect the forest into two equal-size subforests whose total weights differ by at most k. 
How many edges need be cut? Adapting the argument for two-color bisectors to this variant 
in a straightforward manner shows again that 21gTV cuts suffice. 

Having obtained bounds on the size of two-color bisectors for forests, we wish to use them 
for partitioning an arbitrary binary tree into subforests of size at most TV so that every subforest 
has 0(lg TV) edges connected to nodes in other subforests. This result is established in the 
following Theorem. 



Theorem 3.8. Every N-node binary tree can be partitioned into \N/M] subforests, each of 
size at most M, such that no subforest has more than AlgM -\- 8 edges connected to nodes 
in other subforests. 

Proof. First bisect the tree into two subforests, each of size at least [TV/2J, by cutting 
no more than lgTV edges. Split each subforest recursively as follows: For each node in a 
recursively split component of size m assign a weight equal to the number of edges incident 
to that node and which were cut at a previous level. Since the degree of a node is at most 
three, the weight assigned to a node is at most 2. From the argument following Theorem 3.7, 
there is a weighted bisector of size no greater than 21gm for the component. This weighted 
bisector divides the number of external connections almost equally (the difference is at most 
two) between the subcomponents of sizes [m/2j and [m/2]. As seen in Figure 3.18, the number 
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Figure 3. 18:7'o keep the number of external connections to 
all subcomponents small when a component is 
bisected, the external connections must be evenly 
divided between the subcomponents. 

of external connections into either of the new subcomponents is no more than the size of the 
weighted bisector plus one-half the number of external connections into the component just 
split (plus two). This recursive decomposition terminates when each component has size at 
most M. Letting £{m) be the number of external connections into any component of size m, 
we have £(N) = 0, and 

£{m) < Jf(2m) + 21g(2m) + 2. 
A little calculation shows that f(m) < 41gm -f- 8. This means that every subforest of size 
m in the recursive decomposition has at most 41gm -f 8 external edges to other subforests. 
Substituting M for m, the result follows. I 

3.7. Assembling Arbitrary Trees 

The recursive decomposition of Theorem 3.8 leads directly to the design of an efficient 
restructurable chip which can assemble all trees. Observe that the layouts developed in earlier 
sections cannot be used for configurable or restructurable design because the locations at which 
nodes are embedded are determined by the structure of the tree and are not the same for all 
trees. The only way to have nodes at fixed locations, independent of the tree structure, is by 
predetermining the tracks along which edges are routed. 

We can predetermine the tracks along which edges are routed by using restructurable 
permuters. A permuter Pk has k terminals on each side of a rectangle and can realize any 
one-to-one connection between the terminals. The switch shown in Figure 3.19 implements a 
permuter. It has dimensions 2k X k, with the terminals along the longer sides. 
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Figure 3. 19-A permuter can realize, any set of one- to- one 
connections between the terminals on its two 
sides. 
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Figure 3.20l4 restructurable chip which, can assemble ar- 
bitrarily large binary trees. 



The construction of the restructurable chip is recursive and follows the recursive decom- 
position of Theorem 3.8. We shall use R m to denote a level of the recursive layout with m 
nodes, and let R M denote the restructurable chip of M nodes itself. Figure 3.20 shows how 
the chip R M is constructed from four copies of R M / 4 , four copies of P 4 i g M, and two copies of 
AigM+4- Letting S{M) be the length of the side of the layout, we have 5(1) = 1 and, 



S{M) < 2S(M/4) + 0(lgM), 



which yields S{M) — 0{\/M), so that the area is linear in M. The number of pins on R M is 
4 lg M -f- 8. We now show that every large tree can be assembled using Rm- 
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Theorem 3.9. Suppose each restructurable chip contains M nodes. Then any N-node 
binary tree can be assembled using \N j M] chips, the minimum possible. 

Proof. Following Theorem 3.8, decompose the tree into \N/M] components, each of size 
at most M and having no more than 4 lg M + 8 external edges to other components. Each of 
the \N/M] components can be realized on a single chip R M . To sec this, use Theorem 3.8 to 
recursively decompose each component into single nodes, hi this decomposition each subforest 
of size rn has at most 4 1gm + 8 external edges. This decomposition may now be mapped 
directly onto the chip, using the permuters to route edges between different subcomponents. 
Since the number of external edges at any level is no greater than the size of the permuters at 
that level, the permuters can realize the desired routing. Nodes of the tree are embedded at 
lixed positions in the lowest level permuters Pi. Finally, each chip has enough pin connections 
so that the assembly can be completed olf-chip by connecting the chips together as required 
l) Ar tl iri r*rj (yt n o 1 r|nr»r\ m hq«' tion. ^cr miit c r s arc not needed off chip because wires can be routed 
directly.) I 

The constant factors on area can be improved if one uses the smaller restructurable 
permuter 1\ with dimensions [k + 0[\/k)) X [k + 0{\/k)) that follows from the channel routing 
algorithm of Part II of this thesis. Whereas the simpler permuter from Figure 3.19 requires 
only two welds to make a connection, the dense layout might require as many as k welds for 
each connection. Although the total number of welds required by either scheme is O(M), the 
number per wire is O(lgM) if the simpler switch is used and 0(lg 2 M) if the channel-routing 
permuter is used. 

In related work, Rosenberg [69] has also considered permuters to obtain a degree of 
configurability in layouts. 



CHAPTER 4 



The General Framework 



This chapter presents a new framework for general graph layout. Like previous approaches 
to graph layout, the new framework is based on the divide-and-conquer paradigm. Instead of 
using a separator theorem to recursively partition a graph, the new framework uses graph 
bifurcators. The notion of a graph bif'urcator was introduced by Lcighton [42] to overcome the 
deficiency of separator thoorerris. Although fhp rUnWpnceH between bifurcators and separator 
theorems will be elaborated in this chapter, there arc two primary advantages of bifurcators over 
separator theorems. First, unlike separator theorems, bifurcators may be efficiently computed 
using either a good graph partitioning heuristic, or from a layout with small area. Second, 
bifurcators can be used, as in the next chapter, to produce layouts that. are efficient in a variety 
of respects, not layout area alone. 

The techniques for general graph layout closely parallel those in Chapter 3 for efficient 
tree layout. Section 4. J examines multi-colored bisectors for two-ended strings and forests of 
complete binary trees, and generalizes the results of Section 3.6 to more than two colors. 
Section 4.2 introduces decomposition trees and bifurcators as generalizations of separator 
theorems. Section 4.3 considers the problem of balancing decomposition trees, just as Section 
3.G considered the problem of decomposing a tree while balancing the number of external edges 
among split components. Section 4.4 introduces the tree of meshes which is a generalization 
of the resfructurable chip of Section 3.7, and investigates techniques for embedding general 
graphs within the tree of meshes, given a balanced decomposition free for the graph. Section 
4.5 concludes by developing good layouts for the tree of meshes. 

Taken together, an embedding of a graph within the free of meshes, and a good layout for 
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the tree of meshes induce a good layout for the embedded graph. The strategy for laying out a 
general graph, given a decomposition tree is: balance the decomposition tree, embed the graph 
within the tree of meshes, and lay out the tree of meshes. In Chapter 5 we will see how this 
strategy can be used to efficiently solve all the layout problems described in Chapter 2. 

4.1. Combinatorial Lemmas 

This section contains three combinatorial lemmas which provide the foundation for the 
framework presented in the next section. 

Lemma 4.1. Consider any two-ended string of n colored pearls of k different colors, and 
let n,- be the number of pearls which are color i for 1 < i ' < k. For any integer r > 2, 
the pearls can be partitioned into two sets by cutting the string in no more than 9r places 
such that the total number of pearls in each set is [n/2\ or \n/2], the number of pearls of 
color 1 in each set is [ni/2j or fni/2], and such that the number of pearls of color i > 1 
in each set lies between |(| — ^r)n,| and [(^ 4- 2r) n i\- 

Proof. Let i be a number between 1 and k and let T(i) denote the number of cuts necessary 
to divide the set of all pearls into two sets that satisfy the constraints of the theorem for colors 
1,2,..., i. Other than requiring that the total number of pearls be split in half by the cuts, we 
have made no constraints on the distribution of pearls with colors greater than i. We wish to 
find a good bound on T[i) in the worst case, i.e., over all choices of n, k > i, and all possible 
colorings. In what follows, we will show that 7"(1) = 2 and that 

T{i) < rT{i - 1) + Ar + 7 

for i > 1. As a consequence, we can solve the recurrence to conclude that T(i) < 9r l —15 for 
r > 2. Thus for i = k, at most 9r fc cuts are required, as claimed. 

For i = 1, the argument used in Theorem 3.7 shows that two cuts suffice. Consider a 
"window" of size |n/2j positioned at the left end of the string. Without loss of generality, 
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assume that the window covers less than [ni/2j of the pearls colored 1. Move the window to 
the right, one pearl at a time until the window covers [ni/2j pearls of color I. Since the right 
half of the string contains more than one- half of all pearls of color 1 , there must, by continuity, 
exist a placement when the window covers exactly one-half of all pearls of color 1. Uy cutting 
the string at the endpoints of the window, the portion of the string under the window will 
contain half of the total number of pearls and half of the pearls colored 1. Hence T[l) = 2, as 
claimed. 

For a given i > 1, break the string into r segments Sj, 1 < j < r, (making r — 1 cuts) so 
that each segment contains at least [n,-/rj pearls of color {. Next split each Sj into two subsets 
Sjo and Sj\ (making a total of rT[i — 1) cuts) so that each split satisfies the theorem locally 
for colors 1, 2,. . . , i — 1. 

Without loss of generality, assume that Sjq contains no fewer pearls of color i than Sj\. 
At this stage, we divide the set C of all pearls into two subsets C\ and C-> as follows. Initially, 
let C\ = \jSjQ. If Ci contains more than |_(£ + 2?) n iJ pearls of color i, remove S\q from C\ 
and add S\\. Repeat this procedure, successively switching S20 with S21, S30 with S->{, an d so 
on until the first time (?i has at most [(^ + 2r)ni\ pearls of color i. Such a stage must occur 
since the number of pearls of color i in C\ will eventually fall below [w t /2] if C\ and C% are 
completely interchanged. The number of pearls of color i in Ci after the final switch cannot 
be less than [Q — ^)ni\ — 2 since every Sj contains no more than [n,/r] pearls of color i. If 
the number of pearls of color i in C\ is [(| — ^r)n,-] — t or |"Q — ^)ni\ — 2, then move either 
one or two pearls of color i from C% to C\, making no more than four cuts. 

We also have to ensure that the total set of pearls and the pearls of the first i— 1 colors are 
divided as required. The pearls with colors between 2 and { — 1 are divided correctly because 
they were divided correctly at the recursive step. The counts of pearls of color 1 in 6^1 and C^ 
may differ in size by r, however. To balance the number of pearls with color 1 in each set, we 
need only remove up to \rj2\ pearls colored 1 from the excess set (making at most r cuts) and 
put them in the deficient set. To balance the difference in the overall sizes of the sets (which 
now might be as large as 2r + 4), we need only extract up to r + 2 pearls from the larger set 
(making no more than 2r 4- 4 cuts) and put them in the smaller set. Of course, these pearls 
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must be chosen carefully so thai each set retain-.' i»« < M uircd minimum number of pearls of 
each color. Since pearls arc extracted only from the larger set, it is clear that this requirement 
may be easily satisfied. 

The total number of cuts made by the procedure is rT(i — 1) + \r + 7, as claimed. | 

Using an elegant topological argument, Goldberg and West [32] recently proved that k cuts 
suffice to divide the pearls of each color exactly in half. This dramatically reduces the number 
of cuts, and makes our analysis significantly less cumbersome. All of our layout results may, 
however, be proved with the weaker Lemma Jf.l. Both results arc implcmentable in polynomial 
time when the number of colors is fixed, as is the case throughout this thesis. 

Lemma 4.2. Consider any two-ended string of n pearls, n{ of which are colored i, 1 < 
i < k. By cutting the string in k places it is possible to divide the pearls into two sets so 
that each set has a total of [n/2j or [n/2] pearls, and [n{/2\ or [ra t -/2] pearls of color i 
for all i, 1 < i < k. 

In the following, we recast Lemma 4.2 in terms of complete binary trees, which will be 
particularly useful since the recursive decomposition of a graph may be viewed as a tree. The 
height of a tree is the length of the longest path from the root to a leaf, while the height of a 
forest is the maximum height of a tree in the forest. Finally, the level of a node in the forest 
is defined to be the height of the forest minus the length of the longest path from the node to 
a leaf. (Note that the fop level is level zero.) 

Lemma 4.3. Consider a forest of complete binary trees whose n leaves are colored 
arbitrarily with k colors. Let n± be the number of leaves colored i for 1 < i < k. By 
removing no more than k nodes (as well as all incident edges) from each internal level of 
the forest, it is possible to produce a new forest of complete binary trees, some subset of 
which contains [n/2j or \n/2] leaves, and [n t /2j or [n,-/2] nodes of color i for each i, 
1 < i < k. 
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Figure 4.1: An illustration of the procedure in Lemma 4-S. 



Proof. Draw the trees in the canonical manner and place them side-by-side, in any order, 
so that the leaves of all trees are placed along a line. By applying Lemma 4.2 to the induced 
left- to- right ordering on the leaves of the forest, it is possible to break the ordering in no more 
than A: places such that the union of the leaves contained in every other segment contains the 
desired total number of leaves and the desired number of leaves of each color. 

For each break, remove the nodes (and incident edges) which are simultaneously ancestors 
of the leaf immediately to the left of the break and the leaf immediately to the right of the 
break. It is easily seen that at most one node is removed from each internal level of the forest 
for each break. Therefore, no more than k total nodes are removed from each internal level. 
In addition, the removal of the common ancestors of the leaves neighboring a break divides 
the associated tree into two or more complete binary trees, at least one on each side of the 
break. Thus the removal of all such nodes produces a forest of complete binary trees, subsets 
of which correspond precisely to the sets of leaves between pairs of adjacent break points. Thus 
the union of the subsets of trees corresponding to every other segment of leaves contains the 
desired number of leaves of each color. Figure 4.1 illustrates this procedure. | 



DICCOMI'OSITION TRIMS AND B1FURCATORS 51 



•G, 



empty graph or / \ 

isolated node / ■ \ 

--»^G - — i — G 

- i-io j f |.|| 

Figure 4.2: An (F , F if . . . , F r )- decomposition tree 

4.2. Decomposition Trees and Bifurcators 

The recursive decomposition of a graph into smaller and smaller subgraphs may be viewed 
as a decomposition tree. In particular, we say that a graph G has an (F , F\, . . ., F T )- decomposition 
tree if G can be decomposed into two subgraphs Go and G\ by removing no more than Fq edges 
from G, and, in turn, both G and G 3 can bo decomposed into smaller subgraphs by removing 
no more than Fi edges from each, and so on until each subgraph is either empty or an isolated 
node. Figure 4.2 illustrates this recursive decomposition. 

As one might expect, the decomposition of a graph by separator theorems may be viewed 
as a decomposition tree. It follows by definition that if a class of graphs has an /(x)-separator 
theorem, then there are constants a and such that each graph in the class has a decomposition 
tree of the form {0f[N),(3f{aN),(3f(a 2 N), .. . , /3/(l)). The converse is not necessarily true. 
Subgraphs generated at each step of a decomposition by a separator theorem are constrained 
to be proportional in size, whereas decomposition trees need not satisfy this constraint. Of 
course, if the decomposition tree has precisely lgTV levels, then subgraphs at each level must 
be equal in size. 

We shall be particularly interested- in a special class of decomposition trees, namely bifur- 
cators, that is distinct from the class of separators. 

Definition. An N -node graph has an a-bifurcator of size F (more simply, an (F,a)- 
bifurcator) if it has an [F , F / a, F / a 2 ,..., ^-decomposition tree. 
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Of particular interest is the class of \/2~- bifurcators. By the definition, we know that an 
TV-node graph has a v^-bifurcator of size F if and only if it has an (F,F/y/2,F/2, . . . ,\)- 
dccomposition tree. The depth of this tree is no greater than 2\gF. In order to completely 
decompose an TV-node graph into individual nodes, the height of any decomposition tree cannot 
be less than the lg TV. Thus, F must always be at least vTV. On the other hand, F is always 
less than 2N since every TV-node graph with maximum node degree four has at most 2TV edges. 

If a class of graphs has an ^"-separator theorem, where a < 1/2, and the corresponding 
decomposition is balanced in that every graph is always decomposed into equal-size subgraphs, 
then it is straightforward to show that every TV-node graph in the class has a \/2~bifurcator of 
size O(vTV). Similarly, if a class of graphs has a balanced separator theorem of size x a with 
a > 1/2, then every TV-node graph in the class has a \/2-bifurcator of size 0[N a ). 

The converse is not true even if we consider only bifurcators whose corresponding decom- 
position ui'ccs arc uaiancoci so oiia^ every grapn io ueOuiiiposeu into ei^uai-si^e suugrapus. rur 
example, the TV-node graph Sn defined in Section 2.3 has a balanced \Z2-bifurcator of size 
0(i/TVlg TV) but the smallest separator for this class of graphs is fi(z/lg x). 

When translated into bounds on layout area, this seemingly minor difference between 
bifurcators and separators is greatly magnified. Graphs with small layout area always have 
small s/2-bifurcators, but do not always have small separators. This is formalized in the 
following lemma. Later on we will prove the converse: graphs with small v2-»ifurcators always 
have small layout area. 



Lemma 4.4. // a graph G can be laid out in area A, then G has a [yA, \/2)-bifurcator. 

Proof. Consider a vertical cut of length vM through the center of the layout. Next, cut 
each of the sublayouts horizontally through the center. Continuing this sequence of alternating 
vertical and horizontal cuts, it is easy to see that at the 2th step no more than yA/2^ 1 ^ 2 ^ edges 
arc cut from each subgraph. This sequence of cuts yields a (v/1, \/2)-bifurcator for G. | 
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4.2.1 . Special Cases 

Many graphs have decomposition trees in which the number of cuts decreases very slowly 
as we go lower down the tree. In such cases the number of cuts at higher levels of the tree may 
be very small. On the other hand, in decomposition trees corresponding to bifurcators, the 
number of cuts permitted decreases smoothly as we go down the tree. R is conceivable then, 
that the bifurcator permits far more cuts at higher levels than arc necessary. For example, 
TV-node binary trees have decomposition trees of height 0(lg N) in which no more than 1 cut 
is required at every level. Since the minimum bifurcator is at least \/N, the decomposition 
tree corresponding to the bifurcator allows far more cuts at the top levels than needed. 

Similarly, some graphs have decomposition trees in which many cuts are required at the 
top levels, but this number decreases very quickly as we go down the decomposition tree. In 
such cases, the minimum bifurcator is large so that decomposition trees corresponding to the 
Kifij rrrvt or r\ry not underestimate the number of cuts rcnuircd at the top level. However, they 
do greatly overestimate the number of cuts at lower levels. 

R is useful to separate such extreme cases from a general discussion. Of course, general 
upper bounds are valid for graphs with extreme decompositions, but they may overestimate 
the true bound. A particularly important reason for separating these classes is that many 
computationally useful graphs such as binary trees fail into the first category while cube- 
connected-cycles and multidimensional meshes fall into the second category. 

An TV-node graph is defined to have a type A \/2- bifurcator if it has an (0(\//V), \/2)- 
bifurcator such that no more than 0((N/2 l ) a ) cuts, a < 1/2, are required for each partition 
at the z'th level of the associated decomposition tree. Observe that at the higher levels of the 
tree, i < < Ig N, the number of cuts is far less than the 0(\/ r N/2' l/ ' 2 ) cuts allowed by the usual 
bifurcator. 

Similarly, an /V-node graph is defined to have a type B \f'l- bifurcator if it has an (0(N a ), y2)- 
bifurcator, a > 1/2, such that only 0((N/'2 l ) a ) edges are cut in any partition at the ith level. 
Observe that for the lower levels of the tree, i >> 1, this quantity is far smaller than the 
0(N a /2^ 2 ) cuts allowed by the usual bifurcator. 

For simplicity, we will prove results only for general V^-bifurcators in this thesis. However, 
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whenever there is a significant difference, results for the special cases are stated separately. The 
proofs for these special cases arc easily worked out, and closely follow the proofs for the general 
cases. 

4.3. Balanced Decomposition Trees 

Of particular interest to the layout results reported in this thesis are decomposition trees 
where at each step of the decomposition, the two subgraphs are nearly equal in size. This section 
considers such balanced decompositions and gives an effective procedure for transforming an 
arbitrary decomposition tree into one that is balanced. 

Formally, a decomposition tree for a graph G is balanced if each subgraph G w in the tree 
is the father of two subgraphs G w q and G w \ such that the number of nodes in the subgraphs 
differ by at most 1. In addition, we say that a decomposition tree is fully balanced if it is 
balanced, and if for every subgraph G w in the tree, tne set of edges connecting <J — G w to G w 
is divided into two subsets of nearly equal size by the partition of G w into G w o and G w \. (Here 
we allow the number of edge connections in the two subgraphs to differ by a small constant, 
say 5. For the purposes of simplicity, however, we shall often ignore such small differences and 
assume that the nodes and connections are split evenly between the two subgraphs.) 

Somewhat surprisingly, any decomposition tree may be transformed into a fully balanced 
one at little or no cost. We prove this in the following theorem which generalizes earlier results 
in [9, 40, 41, 42]. 

Theorem 4.5. Let G be any N-node graph with an (Fq, F\, . . ., F r )-decomposition tree 
T. Then G has a fully balanced (F' Q , F\, . . ., F[^ N )- decomposition tree, such that for < 



Proof. Let F be a forest of complete binary trees consisting initially of the decomposition 
tree T. Color the leaves of T with two colors according to whether or not the subgraph of G 



associated with the Ira!' is empty. Apply Lemma 4.3 [k = 2) to V, removing the indicated nodes 
and edges of T. Lach node of T corresponds naturally to a set of edges of G, namely the edges 
whose removal splits the associated subgraph m two. Removing a node of T corresponds to 
removing this cutset, of edges from (!. Since no more than 2 nodes are removed from each level 
of F, the number of edges removed from C in applying Lemma 4.3 does not exceed 2 X^I -o ^»> 
which is less than F' Q . 

Further note that G is divided into two disjoint subgraphs of nearly-equal-size by the 
removal of these edges. Each subgraph, in turn, corresponds in a natural way to a subforest 
of complete binary trees in T. Consider one such subgraph Go and color the leaves of the 
associated forest of complete binary trees To using six colors as follows: 

If the leaf corresponds to an empty subgraph, color the leaf with color 1. Otherwise, if the 
single node corresponding to the leaf is incident to exactly j edges of G removed earlier, 
5; 3 '5: 4, then color the leaf with color j -f- 2. 

By applying Lemma 4.3 [k = 6) to To, it is clear that Go can be decomposed into two 
disjoint subgraphs Goo and Goi of nearly-equal-size such that the number of edges from G — Go 
to Goo > s nearly-equal to the number of edges from G — Go to Goi- Since at most 6 nodes were 
removed from each level of T and since To does not contain the root of T, we can conclude 
that no more than 6]T^ =1 F s = F[ edges were removed from Go- 

By applying the above argument recursively, the desired fully-balanced decomposition tree 
is obtained. With each application of Lemma 4.3, the total number of leaves in each forest 
is cut in half at each step so that the biggest tree in any forest corresponding to a subgraph 
decreases in height by at least one. Also, lg A + 1 levels suffice since the size of each subgraph 
is also halved at each step. | 

Theorem 4.6. Every graph with a \/2-bifurcator of size F has a fully balanced \fl-bifurcator 
of size 6(2 -f y/2)F. 

Proof. Immediate from Theorem 4.5, since X)j>o 2"^ 2 < 2 + \/2. I 
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Figure 4.3: TVie 4X4 iree of meshes T 4 . 

Remark. The procedure described in Theorems 4.5 and 4.6 can be implemented in polynomial 
time. 



4.4. Embeddings in the Tree of Meshes 

Leighton [40, 41] introduced the tree of meshes as an example of a planar graph that cannot 
be laid out in linear area. He also showed that every ./V-node planar graph can be embedded in 
an 0(JVlg./V)-node tree of meshes. In this section, we define the tree of meshes and describe a 
general strategy for embedding a graph in the tree of meshes. 

The tree of meshes is formed by replacing each node of a complete binary tree with a mesh 
and each edge by several edges which connect meshes at consecutive levels. More precisely, the 
root of the complete binary tree is replaced by an n X n mesh (it is assumed that n is a power 
of 2), the nodes at the second level are replaced by n X n/2 meshes, those at the third level 
by n/2 X n/2 meshes, and so on until the leaves of the tree are replaced by 1 X 1 meshes. As 
shown in Figure 4.3, each edge of the tree is replaced with edges connecting nodes on one side 
of the higher-level mesh to the top row of the mesh at the lower level. The resulting graph is 
called the n X n tree of meshes T n . It is not difficult to see that T n , has N = 2n 2 Ign + n 2 
nodes. 

In many cases, we use only the top levels of the tree of meshes. The subgraph consisting 
of levels 0, l,...,p [p < 2\gN) of T n is called a truncated tree of meshes T n<p . 
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Theorem 4.7. There is a constant c such that every N -node graph G with an [F , v 2)- 
bifurcator can be embedded in T cF aig^- Moreover, the embedding is regular in the sense 
that F 2 / N nodes of G are embedded in a regular fashion each of the N 2 / F 2 bottom-level 
meshes of T cF 2 i g r± • 

Proof. Wo first use Theorem 4.6 to construct a fully-balanced \/2-bifurcator of size 6(2 + 
yi)F for G. We then use the internal meshes of T cF 2 j n_ to route the edges that were removed 
in the upper 2 lg y- levels of the fully balanced decomposition tree for G. The subgraphs in 
the (2 lg ^)th level of the decomposition tree (each of which has [F 2 /N\ or \F 2 /N] nodes) are 
then embedded in the meshes on the bottom level of the truncated tree of meshes. 

The internal meshes are used as restructurable permuters. As we saw in Section 3.7, 
terminals on opposite sides of a mesh can be connected in any order through the mesh. In 
general, if the number of wires routed through a mesh does not exceed any side-length of 

t.Vip mpsli r» mntino 1 mnv nlwnvc hr* f'minrl ^imiln Hv q (rrinli wifVi A/f rir»(4r»c< ^nn nju'wc- Ko 

embedded in a A.M X 4M mesh with nodes placed in a regular fashion. 

Consider only the fop 2 lg ^f + 1 levels of a fully balanced decomposition tree for G. Each 
of the subgraphs at level 2 lg y- of the decomposition tree has A r (l/2) 21g ' r = F 2 /N nodes. 
(For simplicity we shall assume that F 2 / N is an integer.) Furthermore, if E< is the maximum 
number of edges between G — Gi and Gi, where G, is a subgraph in the decomposition tree at 
level i, then it is easy to see that Eq = and by Theorem 4.6, that 

^ < i^i-t + 6(2 + n/2) F 



2 2( i ~ 1 )/ 2 

for 1 < i ! < 21g^. Solving the above recurrence, we obtain: 

and thus 

Hi < 0(2 + \/2) 2 . 

2(«'" l )/2 
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We now embed G in T eF2l & . First, embed c:v>H of the (21g^)-levcl subgraphs of the 
decomposition tree in the bottom level meshes. This can be done if the side of each mesh at 
level 2 lg y exceeds AF 2 jN. This is true provided 

cF/V2 2]sf > AF 2 /N. 

For c > A, this inequality is easily satisfied. 

Next embed the additional edges through the upper-level meshes in the natural way. No 
more than 2Ei + y edges pass through any ith level mesh. Thus the routing can be performed 
if the smaller side of the ith level meshes exceeds 2Ei±\. In other words, we must have: 

cF/2^/21 > 12 (2 + V2) 2 F/2 , '/ 2 . 
A simple calculation shows that the inequality is satisfied for sufficiently large c. | 

Remark. Throughout the thesis, we express bounds using the term lg y. For all practical 
purposes, F is much smaller than N and this term is greater than one. Should the value of 
F be larger, however, we shall still define lg y to be at least one. Similar interpretations are 
assumed for lglg^ and for lglglg y. The conventions avoid the annoying (and trivial) cases 
when F is very large without complicating the analysis further. 

In the preceding embedding, all the nodes of G were mapped to meshes at the bottom level 
of the truncated tree of meshes. Thus, edges between nodes in different meshes might have to 
be routed through as many as A lg y meshes. Such long edges are undesirable for a variety of 
reasons. It is natural to ask whether an embedding can be found in which each edge can be 
routed through fewer intermediate meshes. This is answered in the following theorem. 
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Theorem 4.8. There exist constants c and k such that every N -node graph G with an 
[F,\f'2)-bifurcator can be embedded in T cF2 i g 4 an d 3UC ^ that no edge is routed through 
more than k intermediate meshes. 

Proof. We adopt a slight variant of the strategy used in the previous theorems. The 
balancing and embedding are done simultaneously and in the same manner as before, except 
at levels 0, k, 2k, 3fc, . . . (where k is a constant specified later). At these levels, we embed the 
nodes that are incident to edges previously cut, and we cut the previously uncut edges incident 
to these nodes. Of course, this could triple the number of cut edges every k levels but if k is 
sufficiently large, this happens infrequently and is not harmful. At all other levels the procedure 
is the same as before, using 6 colors and Lemma 3 to partition the decomposition tree. The 
process terminates after 2 Ig y- levels. 

As before, the embedding is accomplished by using meshes as switching boxes for routing 
prior"? Wp must ensure that the number of cd""cs routed through any rncch decs not exceed the 
side lengths of the mesh. The calculation is the same as before except that the number of cut 
edges is tripled at every kl\\ level. Thus the recurrence for /£,- is 



Ei < h&l k )Ei-i + 6(2 + \/2)- 



2 2('- 1 )/ 2 

Here, we have (without loss of generality) increased number of cut edges by a factor of 3 initially 
and by a factor of ?> x / k at each level instead of increasing the number of cuts by a factor of 3 
at every Mh level. Solving the recurrence, we find 

«, < 18(2 + ^)_^— x; r^3 i/fc i • 

■2('-»>/2 a tU 2 J 

For k > 4, the sum converges to a constant. The remaining analysis is the same as in the 
previous theorems except that the constants are larger. I 

Remark. It is worthwhile to point out here that Theorems 7 and 8 could also have been 
proved using Lemma 4.1 instead of Lemma 4.2. The nodes of G would still be balanced in 
the decomposition tree but the cut edges could only be split 1/3 - 2/3 at each decomposition. 
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Figure 4.4: The H-layout of the tree of meshes 

While this increases the value of the sum, it still converges to a constant. (This is because, for 
sufficiently large k, 2 ^-3 1 ^ k < 1.) Hence, k and c would be larger but the statements of the 
theorems remain the same. 



4.5. Layouts for the Tree of Meshes 

Thus far we have considered only the problem of embedding graphs in the tree of meshes. 
How do we lay out the tree of meshes efficiently? Clearly, any layout for the tree of meshes 
also gives a layout for "every graph that can be embedded within the tree of meshes. In this 
section we develop two different layouts for the tree of meshes. 

The first layout is a straightforward modification of the "H-tree" layout for complete binary 
trees [55]. The modified layout is obtained by expanding each node of the complete binary tree 
into a mesh of the appropriate size. Figure 4.4 shows this layout. It is easy to see that if S(F) 
denotes the side of the layout for 7> , then 5(1) = 1, and 

S{F) < 2S{F/2) + 0{F), 

which gives S{F) = 0{F\gF). This means that the area of the layout for 7> is bounded by 
0[F 2 \g 2 F). As shown in [40, 41], this bound is optimal. 

For truncated trees of meshes, such as considered in Theorems 4.7 and 4.8, a similar result 
holds. 
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Theorem 4.9. The truncated tree of meshes T F<2lg \ has a layout of area 0(F 2 ig p). 
Proof. The obvious restriction of the H-layout to the top levels suffices. | 

Although the mesh edges in the layout, shown in Figure 4.4 have length 1, the edges between 
meshes can be quite long (nearly half the side of the layout). By pulling in meshes closer towards 
the top level, we can reduce the length of the longest edge considerably. This technique was 
introduced in Chapter 3 to produce minimax edge length layouts for trees, and generalizes to 
graphs with known bifurcators. This layout will later be used to find layouts with short edges 
for graphs embedded within the truncated tree of meshes. 

Theorem 4.10. The truncated tree of meshes T Fi2lg K can be laid out in area 0(F 2 Ig ^) 

so that mesh edges have length 1 and edges between meshes have length at most 0(F lg jf / 'g^S t)- 

Proof. Consider the H-tree layout of a complete binary tree of height 21glglg£-, and 
having (lglg^) 2 leaves. Expand each linear dimension by a factor (3 = 0(Flg ^/ lglg ^ ), so 
that each edge of the H-tree layout becomes a channel of width j3 and each node becomes a 
PXP square. The resulting area is (/? lg lg f ) 2 = Q{F 2 lg 2 f). 

Since the channels are much wider than the side of any mesh, we can stack many meshes 
within one channel. In particular, as seen in Figure 4.5, we embed the top level mesh at the 
center of the layout with the second-level meshes on either side. In the first stage of the layout, 
the meshes in the top levels are placed together in a breadth-first manner. Meshes at successive 
levels are equally spaced at distance 9(Flg ^/lglg ^) apart. 

We need to ensure that every channel is wide enough to accomodate the meshes stacked 
within it. To this end, let us suppose that all meshes embedded in the first stage are stacked 
together in the same channel. Of course, this is a gross overestimate, but suffices for our 
argument. Since the path from the root to a leaf in the original (lglg 7?) 2 -leaf H-layout has 
length 0(lglg p), a total of c lg lg U levels of 7V 2 i g ^ are embedded in the first stage. The 
value of the constant c depends on the values of the other constants in the ©-terms and can 
be made as small as necessary. 

The total number of meshes embedded in the first stage is no more than 2 1 + cIg!g > r . Each 
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Figure 4.5: An improved layout for the tree of meshes. 



mesh has side length no greater than F, so to stack all these meshes within one channel of side 
Pf it suffices to have: 

which is easily satisfied when c < 1/2. Hence every channel has sufficient width to stack all 
the ith level meshes across the channel for any i < clglg ^. 

In the second stage, we embed the remaining meshes in the P X P squares. A total of 
OS £)7(lglg £) 2 copies of an 0(lg f) level nr fU ? X 



(ig£) c > 2 ~ VsfrY' 2 



truncated tree of meshes must 



be embedded in each of the (lglg^) 2 P X P regions to accomplish this. Using the layout 
described in Theorem 4.9 for each copy, the total area required in each region is 



e 



(igf) 
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= G 



> 2 lg 2 #' 



(Iglg^f) 



N\2 



This is precisely the amount of area available in each P X P region. Hence the embedding is 
possible. 

It remains to verify that the edges between meshes have length 0{F\g $/lglg ¥')• This 
is easily done since meshes in adjacent levels were spaced distance 0(Flg ^/lglg j?-) apart in 
the first stage, and since meshes in adjacent" levels were located in the same p X P region in 
the second stage. | 



CHAPTER 5 



Solving the Layout Problems 



Using the framework described in the previous section, wc are now ready to present general 
solutions to the eight problems posed in Chapter 2. The layout framework of Chapter 4 applies 
directly to most of these problems, supporting our belief that the dividc-and-conquer strategy 
based on bifurcators is an efficient paradigm for VLSI graph layout. Tn particular, the tree of 
meshes emerges as an extremely versatile network for graph layout. While specific instances 
of some problems might be better solved using different techniques, the framework provides 
a novel and uniform approach for VLSI layout which effectively addresses various unrelated 
issues. The solutions presented in this section are evaluated by comparing them with known 
lower bounds. 

Problem 1. Given a graph G, produce an area-efficient layout for G. 

By Theorem 4.7, every iV-node graph with an [F, \/2)-bifurcator can be embedded in the 
truncated tree of meshes 7o(F),2]g 4- Next, by Theorem 4.9, the truncated tree of meshes can 
be laid out in 0(F 2 \g' 2 y) area. Therefore, every AT- node graph with an [F, \/2)-bifurcator 
can be laid out in 0(F' 2 Ig y) area. 

As a consequence of Lemma 4.4, every /V-nodc graph whose smallest v2-bifurcator is F, 
must occupy at least F' 2 area. For otherwise the graph would have a V^-bifurcator strictly 
smaller than F. Therefore, for every graph the upper bound is at most a factor of Oflg y) 
worse than optimal, i.e., the area bound is universally close to optimal. 

The bounds are also existentially optimal. Lcighton [7, 42] has shown the existence of 
TV-node graphs with minimum \Z2-bifurcator F which require area at least ( AT 1 g y). In 
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other words, no strategy based on bifurcators alone can asymptotically improve upon the 
divide-and-conquer framework. 

Special Cases. Graphs with (F, >/2)-bifurcators with either of the special forms described 
in Section 4.2.1 have 0(/' l2 )-area layouts. Thus, for example, TV-node trees have 0(TV)-area 
layouts. 

Problem 2. Given a graph G, produce an area- efficient layout for G with minimax edge 
length. 

From Theorem 4.8 we know that every TV-node graph with an (F, \/2)-bifurcator can be 
embedded in the truncated tree of meshes 7' 0(F)|2 ]g n. so that no edge passes through more than 
a constant number of intermediate meshes. Furthermore, the layout for the truncated tree of 
meshes given in Theorem 4.10 guarantees that every edge between meshes has length bounded 
by G{F\g ^/!g!g £), wd that every edge within a rncsh hao length one. Combining these two 
theorems, we see that every TV-node graph with an (F, \/2)-bifurcator has an 0(F 2 lg ^)-area 
layout with maximum edge length bounded by 0(F lg y- j lg lg y). 

This bound, too, is existentially optimal [7]. In other words, there exist TV-node graphs 
with minimum \/2-bifurcator F whose minimax edge length is Q(F lg ^/ lg lg y). 

Unfortunately, the bounds are not universally close to optimal. The only general lower 
bound on minimax edge length for TV-node graphs whose minimum \/2-bi['urcator is F, is 
U(F 2 /TV). This general lower bound is also existentially optimal. 

The problem of minimizing maximum edge length appears to quite difficult. Although the 
preceding bounds are disappointingly weak, they are the best known. Recall that in Chapter 
3 we showed that even determining if a tree can be laid out with minimax edge length one, is 
NP-complete. 

Special Cases. The minimax edge length bounds for graphs with special (/'", \/2)- bifurcators 
arc 0(y/N/ lg TV) for type A \/2-bifurcators and 0{F) for type B >/2- bifurcators. 
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Problem 3. Given a graph, produce an area- efficient layout in which each wire has 
bounded delay in the capacitive model. 

First we formalize some details of the model. As usual, a graph describes a connection of 
processors, with an edge corresponding to a bidirectional link between two processors. Each 
node is a processing element which contains one driver and one receiver for each incident edge. 
Every transistor in a processing element has the same size. Thus, in our layouts, a node may 
be represented by a long and skinny box of constant thickness, with length equal to the area 
of an internal transistor. Since each node has bounded degree, a box will be just big enough 
to contain all the transistors in the corresponding processor. Note that different nodes in the 
layout will have different lengths, but the same thickness. We assume that the grid spacing is 
adjusted so that nodes and edges have unit thickness and may be laid along grid lines. Although 
wires are allowed to cross, we will not allow nodes to cross; this corresponds to transistors not 
overlapping. Similarly, wires ano nudes may nut croas. Tno piupagatujn umay uvti a, wire 01 
length / driven by a transistor of area D with capacitive load A is proportional to (/ + A)/D. 
The capacitive load presented to a transistor equals the sum of incident wire lengths and areas 
of adjacent transistors. 

Theorem 5.1. Every N-node graph G with an (F,^/'2)-bifurcator has a bounded-delay 
layout of area 0{F 2 lg 2 ^). 

Proof. As in Theorem 4.8, embed G in a tree of meshes so that adjacent nodes are mapped 
to meshes no more than a constant number of levels apart. Since the dimensions of meshes at 
successive levels, as well as the lengths of edges connecting adjacent meshes in the layout of 
Theorem 4.9, decrease at the same geometric rate, we know that the length of an edge of G is 
proportional to the side lengths of the meshes that contain the corresponding nodes. Assign to 
each node an area that is proportional to the side lengths of the mesh in which it is embedded. 
Thus, the capacitive load on any node, which equals the sum of the areas of all the incident 
edges and adjacent nodes, is proportional to the area of the node. In other words, every wire 
in the layout has bounded delay. 
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Figure 5.1: Laying out expanded nodes in a mesh. 

We need to ensure that each enlarged node can be accomodated in its assigned mesh 
without blowing up the area of the layout by more than a constant factor. This can be done 
by increasing the dimensions of each mesh by a constant factor, and laying out the nodes and 
incident edges as shown in Figure 5.1. Notice that the nodes do not overlap other nodes or 
wires. The area of each node remains proportional to the side lengths of the mesh containing 
it, and thus the delay across every wire is bounded. | 

Special Cases. Similarly, graphs with special (F, \/2)-bifurcators have 0(F 2 )-area bounded- 
delay layouts. Thus, for example, every TV-node tree has an 0(/V)-area bounded-delay layout. 

Theorem 5.1 implies that the area bounds for bounded-delay layouts are no worse than 
the best known general area bounds for Problem 1. However, it is not known whether or not 
there exists a graph for which any bounded-delay layout requires asymptotically greater area 
than the minimum area layout. In the following corollary, we show that any increase in area 
need not be large. 



Corollary 5.2. Any layout of area A for an N-node graph can be transformed into a bounded- 
delay layout of area 0(A\g 2 ^). 

Proof. By Lemma 4.4, an area A layout yields a (\/A, >/2)-bifurcator which can be quickly 
found. Next, by Theorem 5.1, a bounded-delay layout of area 0{A\g 2 &) can be easily 
constructed. Observe that this transformation is effective. | 
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Problem 4. Given a graph G, produce a layout for G with few wire crossings. 

The layouts for the truncated tree of meshes in Theorems 4.9 and 4.10 do not have any edge 
crossings. Since every AT- node graph G with an (F, \/2)-bifureator can be embedded within the 
truncated tree of meshes T , F ^ 2]ji n_, this means that the number of crossings in the layout for 
G cannot exceed the number of nodes in T ^ 2X& &.. In other words, the number of crossings 
in the layout for G is bounded by 0(F 2 lg ^). 

Once again, this bound too is cxistcntially optimal [7]. Moreover, if the minimum y/2- 
bifurcator F of an iV-nodc graph is asymptotically greater than y/N, the number of crossings 
in the layout for G is no more than a factor 0(lg jr) times optimal. 

Special Cases. Graphs with special (F, \/2)-bifurcators can be laid out with 0(F 2 ) crossings. 
Problem 5. Given a graph, produce an area- efficient regular layout for the graph. 

In Theorem 4.7, we showed how to embed any N-node graph G with an (F, \/2)-bifurcator 
in T cF2 , £L for some constant c. Moreover, the nodes of G were divided evenly among the 
N' z jF 2 bottom-level meshes of T cFi2 \ s & and in each bottom-level mesh, the nodes of G were 
embedded in a regular fashion. Thus to produce an 0(F 2 lg 2 ^)-area layout for G that is 
regular, we need only produce a layout for T c f i2l& N for which the nodes at the (2 lg ^)th level 
are located in a regular fashion. In fact, we can do much better, as we show in the following 
theorem. 

Theorem 5.3. The truncated tree of meshes T q ^ f x 2 ^n_ can be laid out in G(F 2 lg y-) 
area so that, for every level i, all nodes within ith level meshes are placed in a regular 
fashion. 

Proof. The first step is to construct a 9(lg ^)-laycr three-dimensional layout [46] of the 
truncated tree of meshes. Fold the connections between the root of the tree of meshes and 
each of its two sons so that the sons fit naturally on a second layer over the root mesh. Fold 
the connections to each of the meshes at the next lower level so they fit, on the third layer, 
directly over the meshes on the second layer, and so forth. This generates a lg ^-layer three- 
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dimensional layout, with each layer occupying linear area. By projecting the three-dimensional 
layout onto the plane in the manner of Thompson [80, pp. .'56-38], the result follows. (The 
same layout can be constructed by interleaving the meshes at each level.) | 

Special Cases. The (9(/<" 2 )-area layouts for graphs with special \/2-bifurcators are also regular. 

Problem 6. Design area- efficient chips that can be configured to realize a large number 
of graphs. 

In Theorem 4.7 we showed that every iV-node graph with an (F, \/2)-bifurcator can be 
embedded in a truncated tree of meshes such that the nodes of the graph are embedded in a 
regular fashion in the bottom-level meshes of T cF 2 \ & it- In fact, the nodes can be mapped to 
fixed positions within the meshes. Therefore, if vvc lay out the truncated tree of meshes on a 
chip with processors at these fixed positions, we have a configurable chip for all graphs with 
the corresponding bifurcatcr. Thus yields the following rtbuit. Obsei vc that tue aiea uouiius 
for configurable layouts are the same as for unrestricted layouts. 

Theorem 5.4. Every N-node graph with an (F,\/2)-bifurcator has a configurable layout 
of area 0{F 2 lg 2 £). 

Proof. Simply make the connections in the meshes after the rest of the chip has been 
fabricated. Recall that we used the meshes as crossbar switches in Theorem 4.7. I 

Special Cases. Similarly, graphs with special bifurcators have 0(F 2 )-area configurable layouts. 
The 0(/V)-area restructurable tree layout of Chapter 3 is such an example. 

Problem 7. On a wafer xvhich has arbitrarily distributed defective cells, realize a given 
graph on the good cells. 

Theorem 4.7showed how to embed any AT- node graph G with an [F, \/2)-bifurcator in the 
truncated tree of meshes T Q i F \ 2 \„ it- The embedding had the property that nodes of the graph 
could be mapped to fixed positions within the meshes at the bottom level. Accordingly, we 
fixed processors at each of these positions. 
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Faulty processors on a wafer therefore correspond to faulty processors in the truncated tree 
of meshes, the correspondence being induced via the layout for the tree of meshes. It is clearly 
no longer possible to realize C in the faulty tree of meshes. However, it is possible to realize a 
smaller graph with a similar structure using only the functioning processors. 

More formally, consider a class of graphs for which any A ? -node graph in the class has a 
\/2-bifurcator of size 0(J(N)) where the function / is such that f(x)/y/x is nondecreasing for 
increasing x. For example, /(x) — ^/x for the class of square meshes (as well as for the class of 
trees or the class of planar graphs). In what follows, we will show how to embed any M-node 
graph from the class in any T f r N) 21g .v that has M functioning processors where N > M 
and c is a sufficiently large constant. 

In particular, we will show how to embed T f , M \ 2 \a- M m the faulty tree of meshes. By 
applying Theorem 4.7 to the smaller tree of meshes embedded within the faulty one, this will 
prove our claim. Thus the layout strategy developed in Chapter 4 is impervious to the existence 
of faulty processors. This result substantially generalizes and simplifies a similar result proved 
by Leighton and Leiserson for embedding meshes around faults in [45]. 

Theorem 5.5. Given the preceding constraints on N, M, c and f, a completely functioning 
truncated tree of meshes 7%( Af ) 2 ig M w ^ M processors can be embedded in any partially 
functioning truncated tree of meshes T c ji N \ t 2\g _* with N processors (M of which are 
functioning) so that the processors of the former are mapped onto the functioning processors 
of the latter. 

Proof. Label the functioning processors in each tree of meshes from 1 to M by counting 
from left to right across the bottom level of each graph. (Recall that the processors are 
evenly distributed on the bottom level.) Map the fcth processor of Tft M \ 2 ig M on ^° the 
kth functioning processor of !Tc/(jv),2ig' ,v . Route the edges of the former graph through the 
meshes of the latter in the usual way, at the same time embedding meshes of the former in 
blocks within the meshes of the latter. 

It remains to show that the capacity of each mesh in [Tc/fAQ.zig a- is sufficient for the 
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embedding. Consider a mesh X on the ith level of T cf ^ N ^ 2 \g-^-- Tllls mcsn nas slclc len S tns 
cf{N)/2 i/2 and at most N/2* functioning processors below it in the bottom level of the 
graph. The only meshes and edges of Tj^ M ^. iXg m that are embedded in X are those that 
correspond to roots of the forest of complete binary trees formed by removing the corresponding 
interval of (at most AT/2 1 ') processors in ?/(a/)/ji k m . These roots are identified by splitting 
T/(Af),2lg m (as in Lemma 4.3) at the two endpoints of the interval. There are at most two 
roots at each level in the resulting forest and the sum of their side lengths (a geometrically 
decreasing sum) is proportional to /(M)/2 J / 2 where j is such that M/2 ] < N/2\ (Remember 
that there are at most N/2 { processors in the leaves of the forest so that the height of the 
largest complete binary tree in the forest is j where M /V < N/2\) Thus the sum of the side 
lengths of the meshes embedded in X is ol ^pprJ j^ ) which, for sufficiently large c, is less 
than c/(/V)/2 t/2 (this is the side length of X), since N > M and f{x)/sfx is a nondecreasing 
function. Hence X is large enough and the embedding is possible. I 

Special Cases. A similar argument works for graphs with special bifurcators. 

Problem 8. Given a graph G, assemble G using the minimum number of copies of a 
single chip having few external pin connections. 

Suppose that we wish to assemble AT- node graphs with (F, >/2)- bifurcators but that each 
chip contains only m nodes, where m < N. Consider a chip consisting of a truncated tree 
of meshes T ^, VmivV Wltn tnc m processors divided equally among the bottom-level 

meshes, and external pin connections to the top of the top level mesh. Two copies of this chip 
may be wired together to form a truncated tree of meshes with 2m processors. Thus, graphs 
with twice as many processors can be assembled with two chips than can be assembled on a 
single chip. More generally, we have the following result. 
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Theorem 5.6. There is a universal restructurable chip with m processors and O (■*—=-) 
external pins, occupying area 0(^plg 2 --y^-), such that every N-node graph with an 
(/'', \/2)-bifurcator can be assembled using multiple copies of the universal chip. Furthermore, 
the number of chips used in the assembly is the minimum possible. 

Proof. Consider the top lg TV — lgm levels of a fully balanced decomposition tree of 
G. Each of the subgraphs at level lg AT - lgm has TV/2 lgAf-lgm = m nodes, and has a y/2- 
bifurcator of size 0{^p=-). By Theorem 4.7, each of these subgraphs can be realized with a 
single universal chip consisting of a truncated tree of meshes T ^m F ySTS^ whose area is 



bounded by 0(=^plg 2 ^?— ), and which has 0{ y ~ l ~-) external pin connections. To complete 
the assembly, the chips are wired up by making connections between pins on different chips as 
given by the decomposition tree. I 

j. i notcwer uiiy cotirsCQUCiicc Oi uiiis rcoiiiu is uniu vvneii /*' — - l/^vj v /j uue luoi/iuLouVtiuit-' emp 
has 0(i/m) pins, which is independent of the size of the network to be assembled. This is the 
best possible. To realize networks with larger bifurcators, the parameters of the restructurable 
chip depend on the size of the network assembled. 

Special Cases. For graphs with special bifurcators, the same is true except that only Q{F 2 ) 
area is used on each chip. For type A \/^-bifurcators, the number of pins needed is much lower. 
For example, TV-node trees require only O(lgm) pins per chip (Theorem 3.9). As is the case for 
all planar graphs, the number of pins does not depend on the number of nodes. This is because 
TV-node planar graphs have V^-bifurcalors of size 0(\//V).) 



CHAPTER 6 



The Channel Routing Problem 



While the layout problems considered in Part I provide new insights and paradigms for 
VLSI graph layout, they are nevertheless abstractions of problems encountered by current 
automatic layout systems. In this second part (Chapters 6 and 7) we shall study the widely en- 
countered channel routing problem which forms the basis of a popular paradigm for automatic 
layout. 

The typical routing problem is characterized by a set of rectangular modules with terminals 
at fixed positions along module boundaries. Labels on the terminals specify the required 
connections - all terminals with the same label must be electrically connected. The problem is 
to wire together all terminals that have the same label. 

Most layout systems proceed in two phases: placement and routing. In the placement phase 
the modules are located at fixed positions, and the required connections are later made in the 
routing phase by running wires around and in between the modules. Of course, the two phases 
go hand-in-hand; a placement for which a complete routing is impossible is of little use. The 
intractability of obtaining optimal solutions in either phase demands that efficient heuristics 
be developed for practical use. 

Introduced by Hashimoto and Stevens in 1971 [3'1], channel routing has become a very 
popular and successful heuristic for routing integrated circuits. As illustrated in Figure 6.1, 
after the modules have been placed, the chip is heuristically partitioned into a set of rectangular 
channels, and each channel is assigned a set of wires which are to pass through it. This 
effectively reduces a difficult "global" wiring problem to a set of disjoint (and presumably 
easier), "local" channel routing subproblems. 
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Figure 6.1: Reducing the global wiring problem into a set 
of channel routing subproblems. 

The performance of the overall strategy is largely determined by the algorithm used to solve 
the individual channel routing subproblems. For this reason, the channel routing problem has 
been intensively studied for over a decade, and many heuristic algorithms have been proposed 
for solving the problem [1, 2, 11, 12, 18, 20, 21, 34, 35, 36, 38, 51, 60, 62, 67, 68, 81, 84]. 
Although many of these heuristics have proved reasonably successful in practice, there are 
instances (albeit theoretical) when the heuristics either produce arbitrarily bad solutions or 
fail to produce any solution. ' Chapter 7 presents a fast approximation algorithm which is 
guaranteed to produce a solution close to optimal. The remainder of this chapter, however, 
poses the problem in a formal framework and briefly reviews some of the previous work on 
channel routing. 



6.1. Manhattan Routing Within Channels 

The channel routing problem may be described as follows. A channel consists of a two-layer 
rectangular grid of columns and tracks (rows). Terminals are located on the top and bottom 
tracks at grid points. The number of tracks between the top and bottom tracks is the width of 
the channel. Each set of terminals to be electrically connected constitutes a net, and distinct 
nets are disjoint. A net with r terminals is called an r-point net. The width may be varied 
by moving the tracks vertically; however, the tracks are not allowed to slide horizontally. In 
other words, the columns are fixed. We also assume that there are no trivial nets (two-point 



74 



THE CHANNEL ROUTING PROBLEM 



^ 




2. 


i : 


a 


«H 




























. li 






3 




' O 1 

5 j 


^ 3 M 


4 





Figure 6.2: Manhattan routing within a channel. Vertical 
cuts measure channel density. 



nets with both terminals in the same column). 

The objective of the channel routing problem is to wire together all terminals in each net 
in a way which minimizes channel width. Wires may be routed on either layer, along any 
track between the top and bottom tracks, and along any column. There is no restriction on 
the number of columns at either end. Electrically disjoint wires may cross at grid points on 
different layers, but may not overlap for any distance even on different layers. A wire may 
change layers at a grid point, in which case no other electrically disjoint wire may pass through 
that grid point on either layer. 

In the Manhattan wiring model, these constraints are satisfied by restricting all horizontal 
wire segments to lie on one layer, and all vertical segments to lie on the other layer. For a wire 
to turn a corner it has to change layers, which requires a contact cut. Clearly, distinct wires 
cannot share a corner since that would violate the constraint that only one wire may change 
layers at any point. For obvious reasons, Manhattan routing is also referred to as layer per 
direction or reserved layer routing. Figure 6.2 illustrates an example of Manhattan routing in 
a channel. 



Remark. The channel routing problem described above is a simpler version of switchbox routing 
in which terminals are located on all sides of a rectangular channel. In many instances, such 
as when two large modules are placed next to each other, terminals lie only along two opposite 
sides of a channel. For this reason, and because switchbox routing problem is much more 
difficult, engineers have focusscd attention primarily on the simpler channel routing problem. 
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6.2. Bounds on Channel Width 

Consider a vertical cut which slices the channel in two (see Figure 6.2). Every net which 
has a terminal on both sides of the cut is said to be split by the cut. Since at least one wire 
must cross the vertical cut for each split net, it follows that at every point the channel must 
be at least as wide as the number of nets split by a vertical cut through that point. In short, 
channel width can be no less than channel density, which is defined as the maximum number 
of nets split by a vertical cut. For example, the channel of Figure 6.2 has density three. 

Can every channel with density d be routed in 0(d) tracks? In practice, most channels can 
be routed in d plus two or three tracks. In general however, this is far from the truth. Brown 
and Rivest [14] gave examples of two-point net channels, with n terminals, whose density is 
one, but for which channel width can be no less than \/2n. Since we shall employ an identical 
argument later, their result is rederived below. 

Theorem 6.1 (Brown- Rivest). Consider the two-point, n-net (shift-one) channel in 
which terminal i is located in column i on the top track, and in column i -f 1 on the bottom 
track. Any Manhattan routing for this channel must have width at least y2n — 1. 

Proof. Suppose that a routing of width w is given. Since the top and bottom terminals 
of any net lie in different columns, each wire in the routing must use a horizontal track to 
change columns at least once. Now, if a wire changes from column i to column j along track y 
(1 < y < n) then either the vertical segment {j,"j— 1) — (j, y) or the segment [j, y) — [j, y + 1) 
can not have a wire laid on it. Otherwise, as seen in Figure 6.3, two different nets will overlap 
at point (j,y). 

In other words, whenever a wire changes columns within the channel, it must change to a 
blank column, one which has no wire in one incident vertical segment. A wire may also change 
columns by exiting across a side of the channel along a horizontal track. 

How many wires can change columns along the first horizontal track? Since all grid points 
on the top track arc occupied, a wire can change columns only by exiting the channel. Hut, 
since segment overlaps are prohibited, at most two wires can change columns in this way. 
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Figure 6.3: A wire can only turn into a blank spot. 

Observe that whenever a wire exits the channel, one blank segment is created along a column. 

The number of wires that can change columns on any horizontal track is bounded by 
the number of blank vertical segments incident to that track, plus two (for wires that exit the 
channel). If 2 wires change columns on the first horizontal track, this creates two empty vertical 
segments incident to the second track, so that 4 wires can change columns on the second track, 
and so on. In general, it is easy to see that the number of wires that can change columns on 
track y is at most 2y when y < [w/2\ and at most 2(tu + 1 — y) otherwise. 

Summing over all horizontal tracks, the total number of wires that can change columns is 
consequently no greater than 

X) 2y+ £ 2(w-y+l), 

o<y<LW2J ' LV2J+1 

which is always less than \{w + l) 2 . Finally, since every wire connecting a net has to change 
columns, we have 

i(w + l) 2 >n, 
or, w > \/2n — 1, thus proving the result. | 



An obvious question that arises is: Can every channel be quickly routed in minimum width? 
Unfortunately, the general problem is NP-complete [77], and remains NP-complete even for 
two-point nets [77, 78]. This might help explain why none of the current heuristics is even 
guaranteed to find solutions that are close to optimal. 
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Figure 6.4: In the knock-knee wiring model, two wires may 
share a corner as long as they remain on different 
layers. 



6.3. Bounds for Other Wiring Models 

While Manhattan wiring rules ease the task of mask fabrication, less restrictive wiring 
models are also occasionally used. For example, some manufacturers may permit wires to 
change direction within a layer, or may allow non-rectilinear wiring. Similarly, other manufac- 
turers may provide more than two layers of interconnect. It is important to consider how 
variations-in the wiring rules affect the routability of channels. 

In the knock-knee wiring model, wires are allowed to change direction within a layer, and 
wires on different layers may share a grid point as long as neither one changes layers at that 
point. The routing illustrated in Figure 6.4 is permissible in the knock-knee model, but not 
in the Manhattan model. Channel density of course remains a lower bound on channel width. 
Rivest, Baratz, and Miller [67] investigated the channel routing problem under the knock-knee 
wiring model. They showed that every two-point net channel with density d can be routed in 
width 2d — 1, independent of the number of nets. In view of Theorem 6.1, this implies that 
the knock-knee wiring model is more powerful than the Manhattan wiring model. Leighton 
[43] gave a construction for channels with density d which cannot be routed in less than 2d— 1 
tracks, so that the Rivest, Baratz, and Miller algorithm is optimal in the worst case. For 
multi-point net channels, their algorithm guarantees a routing of width at most Ad — 1. 

Preparata and Lipski [62] consider the channel routing problem under the knock-knee 
model, but with three layers of interconnect instead of only two. With this extra layer, they 
guarantee that every two-point net channel with density d can be optimally routed using exactly 
d tracks. Moreover, this routing can be accomplished quickly. For multi-point net channels, 
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their algorithm guarantees a routing of width no greater than 2d. 

The problem of "river routing," which is single-layer channel routing, has also received 
considerable attention [21, 23, 51, 81]. Under the single-layer restriction, there exist fast 
algorithms for channel routing. In particular, Leiserson and Pinter [51] also examine the 
problem of placing movable modules along the top and bottom tracks so as to minimize the 
horizontal "spread" and width of a channel. Pinter [61] also studies the problem of river routing 
within polygonal regions with terminals along the perimeter of the polygon. Finally, LaPaugh 
[39] studies the problem of wiring terminals placed along the perimeter of a rectangular module 
where the wires arc on two layers, but arc restricted to lie outside the module. 



CHAPTER 7 



An Approximation Algorithm for Manhattan Routing 



Brown and Rivest's lower bound for the one-shift example indicates that channel density is 
not the only fundamental limitation on channel width. Motivated by their argument, Section 
7.1 introduces the concept of channel flux, which provides another fundamental limitation 
on channel width. Unlike density, flux is a local phenomenon and captures the amount of 
"congestion" within a channel. 

Flux and density together completely characterize the difficulty of Manhattan routing. 
Section 7.2 presents a linear-time algorithm which routes every two-point net channel in width 
proportional to its flux and density. This settles a conjecture of Brown and Rivest that their 
lower bounds arc tight to within a constant factor. Moreover, in practice, flux is extremely 
small so that the algorithm for two-point nets uses no more than a constant number of tracks 
more than density. Section 7.3 analyzes the running time of the algorithm, while Section 7.4 
extends the algorithm to multi-point net channels. 

7.1. Channel Flux 

While channel density provides a fundamental limitation on channel width, it fails to 
capture the local congestion inside a channel. For example, while the one-shift channel has 
low density, the channel width must nevertheless be large to overcome congestion within the 
channel. This congestion arises from the fact that every column in the top track contains a 
terminal whose mate lies in a different column along the bottom track. Since wires in adjacent 
columns may not both "turn right" along a common track without colliding, many horizontal 
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Figure 7.1: The modified one-shift channel can be routed 
in width two. 



tracks are needed to complete the wiring. 

In striking contrast, consider modifying the one-shift channel by making every alternate 
column blank. While this channel is globally similar to the one-shift, it can be routed using 
only two horizontal tracks as shown in Figure 7.1. This channel is not locally congested because 
the empty columns enable many wires to simultaneously turn along the same horizontal track. 

We now introduce the concept of channel flux to measure congestion. Although there are 
a variety of ways to measure congestion, we choose here a simple definition which permits a 
clean analysis. In Section 7.4 we vary the definition slightly to obtain better bounds. 

Suppose that instead of making vertical cuts in the channel, we instead make a horizontal 
cut which isolates a set of contiguous columns from one track. Observe that we can vary the 
size of a cut (measured by the number of columns within the cut) as well as its position. As 
before, we say that a net is split by a horizontal cut if it contains terminals both within the 
cut and outside. For any given position of a cut we can measure the number of distinct nets 
split by the cut. 

Intuition suggests that the greater the number of distinct nets split by a cut, the greater 
the congestion is within the cut. Moreover, the larger the size of a congested cut, the larger 
the channel width, because if the region of local congestion is very large, then so is the overall 
global congestion of the channel. This intuition is formalized below. As mentioned earlier, we 
restrict attention only to channels which do not contain any trivial nets. 



Definition. The flux of a channel is the largest integer f for which there exists a horizontal 
cut of size 2/ 2 which splits at least 2/ 2 — / nontrivial nets. 
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For example, the one-shift channel has flux fi(\/n) because a horizontal cut of size n which 
isolates the top track splits n nets. Similarly, the modified one-shift of Figure 7.1 has flux one. 
For the flux to equal two there must be a cut of size 8 which splits at least 6 nets, but since 
every alternate column in either track is blank no such cut exists. 

Using Brown and Rivest's argument, for the one- shift channel, we next show that flux is 
indeed a lower bound on channel width. 

Theorem 7.1. Every channel with density d and jinx f requires channel width at least 
max(d, f). 

Proof. Find a horizontal cut of the channel which spans 2/ 2 columns and splits at least 
2/ 2 — / nontrivial nets. For each nontrivial net split by the cut, choose any two terminals 
from different columns that lie on opposite sides of the cut. 

Consider the channel formed by the set of chosen terminals, i.e., assume that all columns 
which do not contain a chosen terminal are blank. This new channel consists of at least 2/ 2 — / 
nontrivial two-point nets. Moreover, at most / of the 2/ 2 columns spanned by the original cut 
may be empty. By the same argument used to prove Theorem 6.1, no more than / + 2 of the 
nontrivial nets can be routed into the correct column on the first track: / into empty columns 
and one out each side of the cut. After the first track, there are at most / + 2 empty columns, 
the extra two having possibly been created by wires exiting across the side of the cut in the 
first track. Thus, at most / + 4 nontrivial nets can be routed into the correct column on the 
second track. In general, at most / + 2i nontrivial nets can be routed into the correct column 
on the ith track. 

Let w be the minimum width for which a wiring exists. By the preceding argument, the 
total number of nets that can change columns anywhere in the channel is no greater than 
Sr=i(/ + 2l ) — w f + w i w + !)• B ut since at l east 2 / 2 ~~ ' / nontrivial nets must eventually 
be routed, it follows that wf -f- w(w -f 1) > 2/ 2 — /, or w > /. Thus the original problem 
requires a channel of width at least /. Finally, since the density d also is a lower bound on 
channel width, the Theorem follows. I 
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Plux is negligibly small in practice., and for all purposes never exceeds three or four. One 
explanation for this is that terminals are movable; it is good engineering practice to leave 
enough empty space so that if the channel is congested, then the terminals can be moved 
slightly to allow a better wiring. Moreover, many columns contain less than two terminals, 
and a large fraction of nets contain terminals that are close together on the same side of the 
channel. These are precisely the conditions that make (lux small. Finally, unlike density, flux 
is a local phenomenon and is less likely to grow with the size of a channel or the total number 
of nets. As an example, Dcutsch's "difficult problem" [20] has 72 nets, 174 columns and density 
19, but the flux is just 3. 

7.2. An Approximation Algorithm for Top-to-bottom Nets 

In this section we present a linear-time approximation algorithm for routing channels with 
two-point nets. It is assumed that each net is nontrivial and has exactly two terminals, one each 
on the top and bottom tracks. The next section extends this algorithm to general multi-point 
net channels. 

The input to the algorithm may be presented in one of two ways. It might consist of a list 
of columns, each entry describing the terminals in the top and bottom tracks in that column 
(possibly none). A more compact representation is a list of nets, each net itself being a list 
describing the positions of terminals in that net. The algorithm outputs a detailed wiring of 
the channel. The length of the output is proportional to the total wire area used to route the 
channel. 

The running time of the algorithm will be measured as a function of the shortest possible 
output. This is more reasonable than measuring time as a function of the length of the input 
because the length of the output is always at least as large as the length of the input. In fact, 
the output is generally much longer than the length of the input. 

With this convention for measuring the running time, it is straightforward to sec that either 
input representation described above may be converted to the other in linear time. Moreover, 
if the total number of columns in the channel is c, and if the channel has flux / and density d, 
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the minimum area required to route the channel is at least U(c(d -f- /)). The running time of 
our algorithm is bounded above by 0{c{d + /)), so that it is a linear-time algorithm. 

The algorithm proceeds in four phases. Figure 7.2 sketches the regions routed within the 
different phases. The first two phases distribute empty columns uniformly across the channel, 
thereby dividing the channel into blocks each containing a small number of empty columns. 
This creates a new channel routing problem with possibly higher density, but with reduced 
flux. The third phase, the heart of the algorithm, routes the correct number of wires between 
blocks, without worrying about which columns within a block these wires lie in. Finally, the 
fourth phase routes the wires within each block into the correct column. The empty columns 
within each block allow a block to be wired independently of other blocks, so that every block 
is wired simultaneously on the same horizontal tracks. 



The Top-to-bottom Channel Routing Algorithm 

Phase 1: Partition the channel into groups. 

Find the least integer k such that the channel can be partitioned into groups of k 2 
consecutive columns, each group containing at least 3/c empty grid points in both the top 
and bottom tracks. (An empty grid point is one at which no terminal is placed.) This 
can be accomplished by trying successive values for k (starting with 1,2,3,...) until the 
constraint is satisfied. 
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The definition of flux guarantees that k does not exceed 6(/ + 1). For, suppose that 
k = 6(/ + 1) does not satisfy the constraint. Then some group of 36(/ + l) 2 columns 
contains less than 18(/ + 1) empty grid points on one track. If we partition this group 
into 18 blocks, each of size 2(/-|- 1) 2 , then one of them must have less than (/ + 1) empty 
grid points on one track. But this means that the flux is at least / -f- 1 - a contradiction. 

Phase 2: Distribute empty points uniformly. 

Divide each group of k 2 columns into k blocks of k columns each. Route wires from the 
first 3 points (if non-empty) on the top track of each block into columns that are empty 
on the top track. Since each group has at least 3fc empty points on the top track, this 
routing can be easily accomplished using no more than 3fc horizontal tracks. Repeat the 
same for the bottom track, so that the original channel is reduced to one which can be 
partitioned into blocks of size k such that the leftmost 3 columns of each block are empty. 
The significance of having 3 empty points in each block will be made clear in the detailed 
interblock routing of Phase 3. Observe that although the density of the resulting channel 
may be greater than the density d of the original channel, it can be no greater than d-\-6k. 

Phase 3: Route wires between blocks. 

This phase routes the correct number of wires between different blocks: if x nets have one 
terminal in the top track of block A and the second terminal in the bottom track of block 
B, then route x wires from the top track of block A to the bottom track of block B. It is 
not necessary that the wires be routed into the correct columns, but only that the correct 
number are routed between blocks. This phase is relatively complicated and forms the core 
of the overall strategy. At most d -\- 2>k horizontal tracks are used. Details are descibed 
later in this section. 

Phase 4: Route wires within each block: 

At the end of Phase 3, all that remains is the problem of routing within each block. Each 
block has at most k nets and at least three empty columns. The location of each net is 
determined in Phases 2 and 3. Each net may be routed entirely within its block using, 
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Cor example, the algorithm of Kawamoto and Kajitani [30], which uses no more than 
ffc horizontal tracks. Moreover, every block can be simultaneously routed on the same 
horizontal tracks, so that this phase uses at most |A; tracks. 

Specifically, the nets are routed one per track: the order of routing is determined by 
constraints caused by a top terminal for one net lying above a bottom terminal of another 
net. When a cycle of constraints occurs, one net of the involved cycle is temporarily routed 
into an empty column to eliminate one constraint, and routed to its other terminal after 
the other nets in the cycle have been routed. Two tracks are used to route the last net in 
each such cycle of constraints. | 

Next, we present the detailed routing of Phase 3. Each net is first classified into one of 
three categories. If both terminals of a net lie in the same block then the net is said to be a 
vertical net. Otherwise, if the terminals are in different hlockf; and if the top terminal is to the 
left of the bottom terminal, then the net is called a jailing net. Finally, if the terminals are in 
different blocks and if the top terminal is to the right of the bottom terminal, then the net is 
called a rising net. 

The interblock routing procedure performs a left to right scan across the channel, routing 
each block completely before proceeding to the next block. Between any two consecutive blocks, 
the rising nets run along the upper horizontal tracks, the falling nets run along the lower tracks, 
and every empty horizontal track lies between the tracks containing the rising and falling nets. 

In some cases a wire must be routed through previously routed blocks on the left before 
it can proceed to the right. This requires that space be maintained for wires to backtrack 
(pun intended) when necessary. My keeping the empty tracks between the rising and falling 
nets within each block, we can coalesce the empty tracks in consecutive blocks to form the 
pyramid shown in Figure 7.3. Pyramids are crucial to backtracking; as an example, Figure 7.3 
illustrates how a "blocked" wire can backtrack through the pyramid on its way right. After a 
wire backtracks through the pyramid, the pyramid is updated as shown. 

The following outline describes the interblock routing procedure in detail. Each of the 
steps is illustrated in Figure 7.4. Figure 7.4a shows the initial situation just before a new 
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Figure 7.3: Maintaining a pyramid for backtracking. 

block is entered. The arrows on the tracks indicate whether the net is a rising/falling net that 
terminates within the block, or whether the net terminates in a different block on the right. 
The empty tracks are contained within the pyramid shown. In the case when the block to be 
routed is the leftmost block, the pyramid contains all horizontal tracks and extends to the left 
of the channel. 



The Interblock Routing Procedure 

Step 1: Ending nets. 

Nets with one terminal in a block on the left and the other in the current block are called 
ending nets. By moving the lowest ending rising net upward and the highest ending falling 
net downward wherever possible, the ending nets can be routed in a staircase pattern as 
shown in Figure 7.4b. 

Step 2: Continuing nets. 

Nets with one terminal in a block on the left and the other terminal in a block to the right 
of the current block are called continuing nets. Route the rising ^falling) continuing nets 
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through the block by shifting them up to higher (lower) tracks in a staircase pattern that 
fits the staircase pattern of the ending nets. 

As shown in Figure 7.4c, the staircase pattern of the continuing nets blocks one grid point 
in the top track as well as in the bottom track (unless the block has no ending nets). In 
other words, no net can begin at the grid points shown. However, remember that Phase 
2 provides at least 3 empty grid points on cither track in each block. Since we are free 
to place these empty grid points in any position, we still have at least two empty points 
remaining on either track. 

Step S: Balancing. 

Suppose the number of ending rising nets is greater than the number of ending falling 
nets. Balance the difference by routing some starting rising nets (those which originate in 
the block) as shown in Figure 7.4d. In case there are more ending falling nets than ending 
rising nets, follow a symmetrically opposite procedure. 

Tn order to ensure that every empty column remains between the rising and falling nets it 
may be necessary to force one more empty grid point on the bottom track. Similarly, one 
grid point in the top track is forced to be empty because it is blocked by the rightmost 
starting rising net. At the end of this step, observe that the pyramid may be updated as 
shown in Figure 7.4e. 

Step 4: Starting nets. 

Suppose again that the number of ending rising nets is greater than the number of ending 
falling nets. After balancing the columns in Step 3, route all the starting falling nets as 
shown in Figure 7.4f. Observe that one more grid point on the bottom track is blocked, 
and therefore must be empty. Follow a symmetric procedure in the opposite case. 

Step 5: Remaining nets. 

At this stage either starting rising nets or starting falling nets remain to be wired. Suppose 
that some starting rising nets remain. Route these nets as shown in Figure 7.4g, making 



RUNNING TIME ANALYSIS 89 

use of the pyramid to backtrack whenever necessary. In case the number of remaining 
starting nets equals the number of starting falling nets routed in Step 4, then route the 
last starting rising net using the empty column from Step 3. 

Step 6: Vertical nets. 

Route the vertical nets in the natural way as shown in Figure 7.4h. Note that no extra 
empty points are required. | 

Figure 7.4h shows the complete routing for the block, as well as the updated pyramid 
structure. Observe that the initial conditions are satisfied for routing the next block on the 
right. Furthermore, note that no more than 3 points on any track are required to be empty, so 
that Phase 2 of the main algorithm distributes sufficiently many empty grid points throughout 
the channel. 

Since every ending net is routed before every starting net, the total number of horizontal 
tracks used is no greater than d + 6k, the density of the resulting channel at the end of Phase 
2. Consequently, the number of horizontal tracks used by the main algorithm is at most 
d + 15k = d + 0(f). 

7.3. Running Time Analysis 

To analyze the running time of the algorithm we shall calculate the running time of each 
phase separately. Suppose that a channel has c columns, density d, and flux /. Then, as shown 
earlier, U(c(d + /)) is a lower bound on the minimum area needed to wire the channel. As 
shown below, this is also an upper bound on the running time of the algorithm. 

The first phase computes the smallest integer k for which the channel can be divided into 
groups of k 2 columns each such that every group has at least 3/c empty grid points in both 
the top and bottom tracks. The value of k is computed by successively trying every integer 
(starting with 1,2,...) until the condition is satisfied. For any possible value i, the size of each 
group is i' 2 and there are c/2 2 groups in all. The required condition can easily be checked for 
each group in time 0(i 2 ) so that the total time is 0(c). The total time for Phase 1 is therefore 
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no more than O(ck). But, since k < 6(/ + 1), this is no p. r eater than 0(cf). 

In the second phase, empty columns are evenly distributed among the different blocks 
within each group. Each wire runs along one horizontal track so that the time is no more than 
the total length of wire laid out. Since no more than 3k tracks arc used, the total wire length 
does not exceed Q[ck) = 0(cf). 

Phase 3 is slightly more complicated to analyze. As long as wires do not change direction, 
the time to lay them out is never more than the length of wire laid. However, whenever a 
wire must turn a corner or backtrack, the time requirements can potentially increase. A priori, 
it seems that maintaining the pyramid structure is time consuming; furthermore, the time to 
update the pyramid each time can be significantly large. 

Fortunately, however, the pyramid is only an aid in understanding why the algorithm works 
correctly; there is no need to explicitly maintain the pyramid at all. Any time a wire must 
backtrack, all we really have to do is to simultaneously backtrack along the uppermost and 
lowermost empty tracks until a column, which is empty between the two tracks, is encountered. 
In fact, following this procedure gives the same routing as with the pyramid. It is relatively 
straightforward to argue that, with the modified strategy, the total time spent in Phase 3 is 
no more than 0{c(d + k)) — 0{c(d + /)). 

Finally, Phase 4 requires no more than O(cf) time. Each channel routing subproblem of 
size k can be routed in time 0{k) using O(k) tracks. The total time over all subproblems is 
therefore 0{ck) = 0[cf). 

Summing up, we conclude that the running time of the algorithm is dominated by Phase 
3, and does not exceed 0(c(d + /)), which is linear in the area of the minimum area routing. 

7.4. The Channel Routing Algorithm 

The algorithm of Section 7.3 routed two-point nets which had one terminal in the top 
track and the other in the bottom track. This section extends the algorithm to multi-point 
nets. As before, the algorithm is divided into four phases. Once again, we assume that the 
channel has no trivial two-point nets, and lias density d and flux /. 
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The General Channel Routing Algorithm 

Phase 1: Partition the channel into groups. 

Find the least integer k for which the channel can be partitioned into groups of k 2 
consecutive columns, such that a horizontal cut of size k 2 which isolates either the top or 
bottom track of any group splits at most k 2 — 3k nets. The value of k may be found by 
trying successive values (starting with 1,2,. . .) until the required condition is satisfied. 

As before, it may be verified that the value of k is bounded by 0[f), where / is the flux 
of the channel. 

Phase 2: Distribute empty points uniformly. 

For each track within a group count the number p of empty points. If p > 3k, then 
distribute the empty points as before. If p < 3k then there are at least 3k — p duplicate 
terminals within the group and on the same track. Choose any 3k — p duplicated terminals 
and connect these to other terminals from the same net using one horizontal track for each 
such net. 

Next, pick one representative terminal for each duplicated net connected above. The 
duplicate terminals, being already connected, may be ignored so that each group now has 
at least 3k empty points on either track. Distribute these empty points uniformly as before 
so that each block of size k has at least 3 empty points. Observe that the total number of 
horizontal tracks used is O(k) = 0(f). 

Phase 3: Route wires between blocks. 

Although the basic strategy is the same as before, the major dilTcrcnce is that a net 
may have representative terminals in many different blocks. (Within a block choose any 
one representative terminal, if it exists, on each track.) The modified interblock routing 
procedure is described later in this section, and uses no more than 2d + O(f) tracks. 
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Phase 4: Route wires within each block. 

This phase remains essentially unchanged. The only difference is that within each block 
the representative terminal of any net should be connected to all its duplicates. Although 
the choice of representatives determines the number of horizontal tracks used, this never 
exceeds 0(f). | 

Next, we present the detailed interblock routing of Phase 3. Each net is first classified into 
one of four categories. A net whose leftmost terminal on the top track lies in the same block as 
its leftmost terminal on the bottom track is called a vertical net. If the leftmost top terminal 
(i.e., on the top track) of a net falls in a block to the left of the block containing the leftmost 
bottom terminal (i.e., on the bottom track) of the net then the net is said to be a falling net. 
Conversely, if the block containing the leftmost top terminal of a net is to the right of the 
block containing the leftmost bottom terminal nf the net then the no* is c?.)l?d a rising net. 
Finally, if all terminals of a net lie on the same track (either top or bottom) then the net is 
called a same-aide net. 

In addition, each net is divided into a rising portion and a falling portion. The rising 
portion of a net links the block containing the leftmost terminal to the blocks containing 
terminals in the top track of the channel. The falling portion of a net links the block containing 
the leftmost terminal to the blocks containing terminals in the bottom track of the channel. 
The interblock routing procedure connects the top terminals with the bottom terminals using 
a single connection emerging from the block containing the leftmost terminal. Figure 7.5 
illustrates the rising and falling portions of a net and where the connection is made. Observe 
that not every net is required to have both a rising as well as a falling portion. 

As before, the procedure ensures that between consecutive blocks tracks containing rising 
portions of nets are above every empty track and that every empty track is above the tracks 
containing falling portions of nets. This allows us to once again maintain a pyramid structure 
for backtracking. 

The routing proceeds block-by-block from left to right in the middle 2d + 0(f) tracks of 
the channel. Each block is routed in seven steps described below. The steps are numbered to 
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Figure 7.5: Dividing nets into rising and failing portions. 
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tion. 



coincide with the algorithm of Section 7.3. Figure 7.6 shows a complete routing of a block. 



The Interblock Routing Procedure 

Step 1: Ending nets. 

Route the ending nets (those which do not have a terminal to the right of the current 
block) in staircase patterns at the left end of the block. 
Step 2: Continuing nets. 

Route the continuing nets (those with a terminal in a block to the right of the current 
block) in staircase patterns nestled against those generated in Step 1. If a continuing net 
also has a representative terminal in the current block, then place the terminal to the right 
of the staircase and make a connection as shown in Figure 7.6. 

Step 2.5: Starting same-side nets. 

Route every same-side net whose leftmost terminal lies in the current block in a staircase 
fashion, bringing wires from the bottom (top) track to the lowest (highest) available empty 
track. 

Step 8: Balancing. 

If more columns have been used at the top of the channel than at the bottom, make up 
the difference by routing the rising portions of some starting rising nets. If the opposite 
case holds, follow the symmetric procedure. 
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Figure 7.G: Complete Phase S routing within a block. 



Step 4' Starting nets. 

Route the falling portions of starting falling nets (or the rising portions of starting rising 
nets depending on which was in excess in Step 3). 

Step 5: Remaining nets. 

Route the remaining rising portions of starting rising nets (or the falling portions of remain- 
ing starting falling nets), using the pyramid for backtracking if necessary. Furthermore, 
route the falling portions of starting rising nets and the rising portions of starting falling 
nets in the straightforward way using empty tracks. 

Step 6: Vertical nets. 

Route the vertical nets in empty columns as before. Q 
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Since the rising and falling portions of each net are effectively separated, the interblock 
routing procedure requires no more than 2d -f 0(f) horizontal tracks. As before, it can be 
argued that the overall algorithm runs in linear time, and routes a channel of density d and 
flux / in width 2d -\- 0(f). To summarize, we have shown the following. 

Theorem 7.2. Every multi-point net channel with density d and flux f can be routed in width 
no greater than 2d -f 0(f) in linear time. 

Furthermore, if every net is a same-side net or only has a rising portion or a falling portion 
(but not both) then the number of tracks used is d -)- 0(f). In particular, for two-point net 
channels we have the following result. 

Theorem 7.3. Every two-point net channel with density d and flux f can be routed in width 
d + 0(f) in linear time. 



CHAPTER 8 



Conclusions, Extensions and Open Problems 



This thesis was motivated by the need for a clearer understanding of various issues in 
circuit layout. The techniques developed provide new insights and approaches for VLSI layout. 
Although the results in their present form are theoretical in nature, it is likely that some of 
the techniques can be adapted for use in practice. 

The two parts of the thesis share a common underlying methodology. First, the critical 
properties that determine the quality of a layout are identified. In the next step, these properties 
are effectively exploited to obtain good layouts. Thus, for example, the minimum bifurcator 
of a graph gives a lower bound on layout area, and good layouts can be found quickly if a 
decomposition is available. Similarly, flux and density give lower bounds on channel width; 
they also provide the basis for a fast, provably good channel routing algorithm. 

The strategy for VLSI graph layout in Part I provides a simple and uniform technique for 
solving a variety of layout problems efficiently. The unified framework is suitable for custom 
layout, and at the same time is efficient with regard to area, delay, and fault-tolerance. The 
tree of meshes, in particular, emerges as a surprisingly versatile and powerful network for 
circuit layout. A priori, there is no reason to believe that such diverse concerns can be handled 
simultaneously in a compatible manner, let alone within a common framework. 

Approaching the channel routing problem from a theoretical viewpoint, Part II charac- 
terizes the properties that make Manhattan routing diificult. These properties then form the 
basis of a new, linear-time approximation algorithm that is guaranteed to always find a near- 
optimal routing. In contrast, although the problem had been studied intensively for over a 
decade from an engineering viewpoint, all previous heuristics could be made to perform ar- 
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bitrarily poorly on certain inputs. 

These results notwithstanding, a number of problems are left, unresolved in this thesis. 
The following sections mention some of the more important open problems, and also sketch 
extensions to the results reported. More details on specific problems may be found in [7]. 

8.1. Problems in Graph Layout 

The divide-and-conquer strategy based on graph bifurcators has also been successfully ap- 
plied by Leighton and Rosenberg [46] to the study of three-dimensional VLSI circuit layout. 
In addition, the techniques and results are also applicable to graph and data-structure embed- 
dings, and also provide bounds on one- and two- dimensional bandwidth minimization. 

Question 1. How much area is required to lay out an iV-node planar graph? The best 
universal upper bound is 0(N lg 2 N) [49, 83] while the best existential lower bound (for 
the tree of meshes) is Q(NlgN) [40, 41]. 

Question 2. Is there a polynomial time algorithm for laying out trees with edges not much 
longer than the minimax edge length? The best tree layout algorithm (Chapter 3) produces 
layouts with edges of length &[\/N/lgN). Although this is optimal for some trees, it is 
way off for others'. 

Question 3. Is there a better way to realize a network in an environment that contains 
defective processors? The results of Chapter 5 guarantee that any graph can be realized 
using the good processors provided the "channels" have width fi(-£=lgp ; ) in a regular 
layout. Although this bound is optimal for some networks [7], it is not known to be 
optimal for simpler networks such as two-dimensional arrays. 

Question 4. Is there a provably good heuristic for graph bisection? Any such heuristic 
could be used to find efficient decomposition trees and bifurcators, which, in turn, could 
be used to produce good layouts [7, 42]. There are many heuristics which do very well in 
practice [13, 17, 24, 37, 66, 71]. Analyzing these or developing new heuristics along similar 
lines is likely to have an impact on VLSI layout. 
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Question 5. Can the framework be extended to deal with processors of variable size and 
shape? While it is relatively easy to deal with equal-size processors, any progress toward 
the general problem would be very interesting. 

8.2. Problems in Channel Routing 

While the algorithms of Chapter 7 are fast and are guaranteed to produce near-optimal 
routings, the analysis of the constant factors leaves much room for improvement. In particular, 
the actual number of tracks used by the algorithm may be much less than the upper bounds 
indicate. 

For example, if the empty grid points are already uniformly distributed to begin with, 
then Phase 2 needs to perform only a minor redistribution of empty points. Consequently, the 
upper bound of 6k < 36(/-f-l) tracks to redistribute empty points, is a gross overestimate. On 
the other hand, if the empty points are not uniformly distributed, but are bunched together in 
groups, then the actual lower bound is underestimated by flux. To see this, observe that along 
a horizontal track at most two wires can turn into a blank column inside a bunch of empty 
columns. However, the lower bound argument for flux does not take the density/frequency of 
blank points into consideration. Since flux underestimates the true bound in this case, once 
again, we see that the performance of the algorithm is much better in relation to the actual 
value than what the bounds indicate. 

In addition, it is possible to obtain tighter bounds more directly, by redefining the notion of 
flux. Rather than making horizontal cuts in the channel, it is better to employ the argument to 
"windows," i.e., groups of contiguous columns. This is the idea adopted by Brown and Rivest 
in their lower bound arguments. The advantage of this lower bound strategy is that if many 
wires are forced to change columns within the window, then the lower bound is very high. On 
the other hand, if many wires exit across the sides of the window, then the width must again 
be large since at most two wires can exit the window along a horizontal track. Is it possible 
to redefine the notion of flux to capture some of these bounds? What is the best definition for 
flux? Finally, do multi-point nets really require 2d -f 0(f) tracks, or will d -f- 0(f) suffice? 
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At a more general level, it would be interesting to investigate the applicability of flux to 
other wiring problems, such as, for example, the switehbox problem. In conclusion, we mention 
that Baker, Bhatt, and Leighton {3] extend the results of the Manhattan wiring model to the 
case where contact cuts are larger than wires. In this case it towns out that flux is never more 
than a constant, so that density is the sole limiting factor on channel width. 
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